SHaRPose: Sparse High-Resolution Representation for Human Pose
Estimation
- URL: http://arxiv.org/abs/2312.10758v1
- Date: Sun, 17 Dec 2023 16:29:16 GMT
- Title: SHaRPose: Sparse High-Resolution Representation for Human Pose
Estimation
- Authors: Xiaoqi An, Lin Zhao, Chen Gong, Nannan Wang, Di Wang, Jian Yang
- Abstract summary: We propose a framework that only uses Sparse High-resolution Representations for human Pose estimation (SHaRPose)
Our model SHaRPose-Base achieves 77.4 AP (+0.5 AP) on the validation set and 76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of $1.4times$ faster than ViTPose-Base.
- Score: 39.936860590417346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-resolution representation is essential for achieving good performance in
human pose estimation models. To obtain such features, existing works utilize
high-resolution input images or fine-grained image tokens. However, this dense
high-resolution representation brings a significant computational burden. In
this paper, we address the following question: "Only sparse human keypoint
locations are detected for human pose estimation, is it really necessary to
describe the whole image in a dense, high-resolution manner?" Based on dynamic
transformer models, we propose a framework that only uses Sparse
High-resolution Representations for human Pose estimation (SHaRPose). In
detail, SHaRPose consists of two stages. At the coarse stage, the relations
between image regions and keypoints are dynamically mined while a coarse
estimation is generated. Then, a quality predictor is applied to decide whether
the coarse estimation results should be refined. At the fine stage, SHaRPose
builds sparse high-resolution representations only on the regions related to
the keypoints and provides refined high-precision human pose estimations.
Extensive experiments demonstrate the outstanding performance of the proposed
method. Specifically, compared to the state-of-the-art method ViTPose, our
model SHaRPose-Base achieves 77.4 AP (+0.5 AP) on the COCO validation set and
76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of
$1.4\times$ faster than ViTPose-Base.
Related papers
- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model [27.849059115252008]
We propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints.
Our MDPose achieves state-of-the-art performance by successfully learning the high-dimensional joint distribution of human keypoints.
arXiv Detail & Related papers (2023-02-17T08:29:33Z) - Towards High Performance One-Stage Human Pose Estimation [13.220521786778544]
Mask RCNN can largely improve the efficiency by conducting person detection and pose estimation in a single framework.
In this paper, we aim to largely advance the human pose estimation results of Mask-RCNN and still keep the efficiency.
arXiv Detail & Related papers (2023-01-12T07:02:17Z) - 2D Human Pose Estimation with Explicit Anatomical Keypoints Structure
Constraints [15.124606575017621]
We present a novel 2D human pose estimation method with explicit anatomical keypoints structure constraints.
Our proposed model can be plugged in the most existing bottom-up or top-down human pose estimation methods.
Our methods perform favorably against the most existing bottom-up and top-down human pose estimation methods.
arXiv Detail & Related papers (2022-12-05T11:01:43Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - Super Resolution in Human Pose Estimation: Pixelated Poses to a
Resolution Result? [9.577509224534323]
We introduce a novel Mask-RCNN approach to decide when to use SR during the keypoint detection step.
We find that for low resolution people their keypoint detection performance improved once SR was applied.
To address this we introduced a novel Mask-RCNN approach, utilising a segmentation area threshold to decide when to use SR during the keypoint detection step.
arXiv Detail & Related papers (2021-07-05T16:06:55Z) - Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive
Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance.
First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression.
Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance.
Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.