Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale
Persons
- URL: http://arxiv.org/abs/2208.11975v1
- Date: Thu, 25 Aug 2022 10:09:10 GMT
- Title: Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale
Persons
- Authors: Yu Cheng, Yihao Ai, Bo Wang, Xinchao Wang, Robby T. Tan
- Abstract summary: In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons.
Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA)
For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing.
- Score: 75.86463396561744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-person 2D pose estimation, the bottom-up methods simultaneously
predict poses for all persons, and unlike the top-down methods, do not rely on
human detection. However, the SOTA bottom-up methods' accuracy is still
inferior compared to the existing top-down methods. This is due to the
predicted human poses being regressed based on the inconsistent human bounding
box center and the lack of human-scale normalization, leading to the predicted
human poses being inaccurate and small-scale persons being missed. To push the
envelope of the bottom-up pose estimation, we firstly propose multi-scale
training to enhance the network to handle scale variation with single-scale
testing, particularly for small-scale persons. Secondly, we introduce dual
anatomical centers (i.e., head and body), where we can predict the human poses
more accurately and reliably, especially for small-scale persons. Moreover,
existing bottom-up methods use multi-scale testing to boost the accuracy of
pose estimation at the price of multiple additional forward passes, which
weakens the efficiency of bottom-up methods, the core strength compared to
top-down methods. By contrast, our multi-scale training enables the model to
predict high-quality poses in a single forward pass (i.e., single-scale
testing). Our method achieves 38.4\% improvement on bounding box precision and
39.1\% improvement on bounding box recall over the state of the art (SOTA) on
the challenging small-scale persons subset of COCO. For the human pose AP
evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the
single-scale testing. We also achieve the top performance (40.3 AP) on OCHuman
dataset in cross-dataset evaluation.
Related papers
- AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Rethinking pose estimation in crowds: overcoming the detection
information-bottleneck and ambiguity [46.10812760258666]
Frequent interactions between individuals are a fundamental challenge for pose estimation algorithms.
We propose a novel pipeline called bottom-up conditioned top-down pose estimation.
We demonstrate the performance and efficiency of our approach on animal and human pose estimation benchmarks.
arXiv Detail & Related papers (2023-06-13T16:14:40Z) - 2D Human Pose Estimation with Explicit Anatomical Keypoints Structure
Constraints [15.124606575017621]
We present a novel 2D human pose estimation method with explicit anatomical keypoints structure constraints.
Our proposed model can be plugged in the most existing bottom-up or top-down human pose estimation methods.
Our methods perform favorably against the most existing bottom-up and top-down human pose estimation methods.
arXiv Detail & Related papers (2022-12-05T11:01:43Z) - Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation.
Existing top-down and bottom-up approaches to pose estimation suffer from detection errors.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and
Bottom-Up Networks [33.974241749058585]
Multi-person pose estimation can cause human detection to be erroneous and human-joints grouping to be unreliable.
Existing top-down methods rely on human detection and thus suffer from these problems.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2021-04-05T07:05:21Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.