Towards High Performance One-Stage Human Pose Estimation
- URL: http://arxiv.org/abs/2301.04842v1
- Date: Thu, 12 Jan 2023 07:02:17 GMT
- Title: Towards High Performance One-Stage Human Pose Estimation
- Authors: Ling Li, Lin Zhao, Linhao Xu, Jie Xu
- Abstract summary: Mask RCNN can largely improve the efficiency by conducting person detection and pose estimation in a single framework.
In this paper, we aim to largely advance the human pose estimation results of Mask-RCNN and still keep the efficiency.
- Score: 13.220521786778544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Making top-down human pose estimation method present both good performance
and high efficiency is appealing. Mask RCNN can largely improve the efficiency
by conducting person detection and pose estimation in a single framework, as
the features provided by the backbone are able to be shared by the two tasks.
However, the performance is not as good as traditional two-stage methods. In
this paper, we aim to largely advance the human pose estimation results of
Mask-RCNN and still keep the efficiency. Specifically, we make improvements on
the whole process of pose estimation, which contains feature extraction and
keypoint detection. The part of feature extraction is ensured to get enough and
valuable information of pose. Then, we introduce a Global Context Module into
the keypoints detection branch to enlarge the receptive field, as it is crucial
to successful human pose estimation. On the COCO val2017 set, our model using
the ResNet-50 backbone achieves an AP of 68.1, which is 2.6 higher than Mask
RCNN (AP of 65.5). Compared to the classic two-stage top-down method
SimpleBaseline, our model largely narrows the performance gap (68.1 AP vs. 68.9
AP) with a much faster inference speed (77 ms vs. 168 ms), demonstrating the
effectiveness of the proposed method. Code is available at:
https://github.com/lingl_space/maskrcnn_keypoint_refined.
Related papers
- SHaRPose: Sparse High-Resolution Representation for Human Pose
Estimation [39.936860590417346]
We propose a framework that only uses Sparse High-resolution Representations for human Pose estimation (SHaRPose)
Our model SHaRPose-Base achieves 77.4 AP (+0.5 AP) on the validation set and 76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of $1.4times$ faster than ViTPose-Base.
arXiv Detail & Related papers (2023-12-17T16:29:16Z) - Effective Whole-body Pose Estimation with Two-stages Distillation [52.92064408970796]
Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an image.
We present a two-stage pose textbfDistillation for textbfWhole-body textbfPose estimators, named textbfDWPose, to improve their effectiveness and efficiency.
arXiv Detail & Related papers (2023-07-29T03:49:28Z) - Rethinking pose estimation in crowds: overcoming the detection
information-bottleneck and ambiguity [46.10812760258666]
Frequent interactions between individuals are a fundamental challenge for pose estimation algorithms.
We propose a novel pipeline called bottom-up conditioned top-down pose estimation.
We demonstrate the performance and efficiency of our approach on animal and human pose estimation benchmarks.
arXiv Detail & Related papers (2023-06-13T16:14:40Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - PoseRAC: Pose Saliency Transformer for Repetitive Action Counting [56.34379680390869]
We introduce Pose Saliency Representation, which efficiently represents each action using only two salient poses instead of redundant frames.
We also introduce PoseRAC, which is based on this representation and achieves state-of-the-art performance.
Our lightweight model is highly efficient, requiring only 20 minutes for training on a GPU, and infers nearly 10x faster compared to previous methods.
arXiv Detail & Related papers (2023-03-15T08:51:17Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z) - EfficientPose: Efficient Human Pose Estimation with Neural Architecture
Search [47.30243595690131]
We propose an efficient framework targeted at human pose estimation including two parts, the efficient backbone and the efficient head.
Our smallest model has only 0.65 GFLOPs with 88.1% PCKh@0.5 on MPII and our large model has only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model.
arXiv Detail & Related papers (2020-12-13T15:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.