Rethinking pose estimation in crowds: overcoming the detection
information-bottleneck and ambiguity
- URL: http://arxiv.org/abs/2306.07879v2
- Date: Sat, 30 Sep 2023 15:12:07 GMT
- Title: Rethinking pose estimation in crowds: overcoming the detection
information-bottleneck and ambiguity
- Authors: Mu Zhou and Lucas Stoffl and Mackenzie Weygandt Mathis and Alexander
Mathis
- Abstract summary: Frequent interactions between individuals are a fundamental challenge for pose estimation algorithms.
We propose a novel pipeline called bottom-up conditioned top-down pose estimation.
We demonstrate the performance and efficiency of our approach on animal and human pose estimation benchmarks.
- Score: 46.10812760258666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Frequent interactions between individuals are a fundamental challenge for
pose estimation algorithms. Current pipelines either use an object detector
together with a pose estimator (top-down approach), or localize all body parts
first and then link them to predict the pose of individuals (bottom-up). Yet,
when individuals closely interact, top-down methods are ill-defined due to
overlapping individuals, and bottom-up methods often falsely infer connections
to distant bodyparts. Thus, we propose a novel pipeline called bottom-up
conditioned top-down pose estimation (BUCTD) that combines the strengths of
bottom-up and top-down methods. Specifically, we propose to use a bottom-up
model as the detector, which in addition to an estimated bounding box provides
a pose proposal that is fed as condition to an attention-based top-down model.
We demonstrate the performance and efficiency of our approach on animal and
human pose estimation benchmarks. On CrowdPose and OCHuman, we outperform
previous state-of-the-art models by a significant margin. We achieve 78.5 AP on
CrowdPose and 48.5 AP on OCHuman, an improvement of 8.6% and 7.8% over the
prior art, respectively. Furthermore, we show that our method strongly improves
the performance on multi-animal benchmarks involving fish and monkeys. The code
is available at https://github.com/amathislab/BUCTD
Related papers
- AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking
in Real-Time [47.19339667836196]
We present AlphaPose, a system that can perform accurate whole-body pose estimation and tracking jointly while running in realtime.
We show a significant improvement over current state-of-the-art methods in both speed and accuracy on COCO-wholebody, COCO, PoseTrack, and our proposed Halpe-FullBody pose estimation dataset.
arXiv Detail & Related papers (2022-11-07T09:15:38Z) - Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale
Persons [75.86463396561744]
In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons.
Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA)
For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing.
arXiv Detail & Related papers (2022-08-25T10:09:10Z) - Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation [33.86986028882488]
Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders.
Existing methods try to handle occlusion with pose priors/constraints, data augmentation, or implicit reasoning.
We develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation.
arXiv Detail & Related papers (2022-07-29T22:12:50Z) - Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation.
Existing top-down and bottom-up approaches to pose estimation suffer from detection errors.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z) - Direct Dense Pose Estimation [138.56533828316833]
Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies.
Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person.
We propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP)
arXiv Detail & Related papers (2022-04-04T06:14:38Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization [83.57863764231655]
We propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization.
A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints.
We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets.
arXiv Detail & Related papers (2020-07-17T12:44:23Z) - SMPR: Single-Stage Multi-Person Pose Regression [41.096103136666834]
A novel single-stage multi-person pose regression, termed SMPR, is presented.
It follows the paradigm of dense prediction and predicts instance-aware keypoints from every location.
We show that our method not only outperforms existing single-stage methods and but also be competitive with the latest bottom-up methods.
arXiv Detail & Related papers (2020-06-28T11:26:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.