MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model
- URL: http://arxiv.org/abs/2302.08751v2
- Date: Mon, 8 May 2023 12:22:30 GMT
- Title: MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model
- Authors: Seunghyeon Seo, Jaeyoung Yoo, Jihye Hwang, Nojun Kwak
- Abstract summary: We propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints.
Our MDPose achieves state-of-the-art performance by successfully learning the high-dimensional joint distribution of human keypoints.
- Score: 27.849059115252008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the major challenges in multi-person pose estimation is instance-aware
keypoint estimation. Previous methods address this problem by leveraging an
off-the-shelf detector, heuristic post-grouping process or explicit instance
identification process, hindering further improvements in the inference speed
which is an important factor for practical applications. From the statistical
point of view, those additional processes for identifying instances are
necessary to bypass learning the high-dimensional joint distribution of human
keypoints, which is a critical factor for another major challenge, the
occlusion scenario. In this work, we propose a novel framework of single-stage
instance-aware pose estimation by modeling the joint distribution of human
keypoints with a mixture density model, termed as MDPose. Our MDPose estimates
the distribution of human keypoints' coordinates using a mixture density model
with an instance-aware keypoint head consisting simply of 8 convolutional
layers. It is trained by minimizing the negative log-likelihood of the ground
truth keypoints. Also, we propose a simple yet effective training strategy,
Random Keypoint Grouping (RKG), which significantly alleviates the underflow
problem leading to successful learning of relations between keypoints. On
OCHuman dataset, which consists of images with highly occluded people, our
MDPose achieves state-of-the-art performance by successfully learning the
high-dimensional joint distribution of human keypoints. Furthermore, our MDPose
shows significant improvement in inference speed with a competitive accuracy on
MS COCO, a widely-used human keypoint dataset, thanks to the proposed much
simpler single-stage pipeline.
Related papers
- SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method
for Autonomous Driving [3.3113002380233447]
We present a novel pedestrian pose keypoint completion method called the separation and dimensionality reduction-based generative adversarial imputation networks (SDR-GAIN)
The SDR-GAIN algorithm exhibits a remarkably short running time of approximately 0.4ms and boasts exceptional real-time performance.
arXiv Detail & Related papers (2023-06-06T09:35:56Z) - 2D Human Pose Estimation with Explicit Anatomical Keypoints Structure
Constraints [15.124606575017621]
We present a novel 2D human pose estimation method with explicit anatomical keypoints structure constraints.
Our proposed model can be plugged in the most existing bottom-up or top-down human pose estimation methods.
Our methods perform favorably against the most existing bottom-up and top-down human pose estimation methods.
arXiv Detail & Related papers (2022-12-05T11:01:43Z) - Direct Dense Pose Estimation [138.56533828316833]
Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies.
Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person.
We propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP)
arXiv Detail & Related papers (2022-04-04T06:14:38Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - Greedy Offset-Guided Keypoint Grouping for Human Pose Estimation [31.468003041368814]
We employ an Hourglass Network to infer all the keypoints from different persons indiscriminately.
We greedily group the candidate keypoints into multiple human poses, utilizing the predicted guiding offsets.
Our approach is comparable to the state of the art on the challenging COCO dataset under fair conditions.
arXiv Detail & Related papers (2021-07-07T09:32:01Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z) - Differentiable Hierarchical Graph Grouping for Multi-Person Pose
Estimation [95.72606536493548]
Multi-person pose estimation is challenging because it localizes body keypoints for multiple persons simultaneously.
We propose a novel differentiable Hierarchical Graph Grouping (HGG) method to learn the graph grouping in bottom-up multi-person pose estimation task.
arXiv Detail & Related papers (2020-07-23T08:46:22Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.