Dual networks based 3D Multi-Person Pose Estimation from Monocular Video
- URL: http://arxiv.org/abs/2205.00748v2
- Date: Wed, 4 May 2022 07:08:12 GMT
- Title: Dual networks based 3D Multi-Person Pose Estimation from Monocular Video
- Authors: Yu Cheng, Bo Wang, Robby T. Tan
- Abstract summary: Multi-person 3D pose estimation is more challenging than single pose estimation.
Existing top-down and bottom-up approaches to pose estimation suffer from detection errors.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
- Score: 42.01876518017639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D human pose estimation has made progress in recent years. Most of
the methods focus on single persons, which estimate the poses in the
person-centric coordinates, i.e., the coordinates based on the center of the
target person. Hence, these methods are inapplicable for multi-person 3D pose
estimation, where the absolute coordinates (e.g., the camera coordinates) are
required. Moreover, multi-person pose estimation is more challenging than
single pose estimation, due to inter-person occlusion and close human
interactions. Existing top-down multi-person methods rely on human detection
(i.e., top-down approach), and thus suffer from the detection errors and cannot
produce reliable pose estimation in multi-person scenes. Meanwhile, existing
bottom-up methods that do not use human detection are not affected by detection
errors, but since they process all persons in a scene at once, they are prone
to errors, particularly for persons in small scales. To address all these
challenges, we propose the integration of top-down and bottom-up approaches to
exploit their strengths. Our top-down network estimates human joints from all
persons instead of one in an image patch, making it robust to possible
erroneous bounding boxes. Our bottom-up network incorporates human-detection
based normalized heatmaps, allowing the network to be more robust in handling
scale variations. Finally, the estimated 3D poses from the top-down and
bottom-up networks are fed into our integration network for final 3D poses. To
address the common gaps between training and testing data, we do optimization
during the test time, by refining the estimated 3D human poses using high-order
temporal constraint, re-projection loss, and bone length regularizations. Our
evaluations demonstrate the effectiveness of the proposed method. Code and
models are available: https://github.com/3dpose/3D-Multi-Person-Pose.
Related papers
- Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale
Persons [75.86463396561744]
In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons.
Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA)
For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing.
arXiv Detail & Related papers (2022-08-25T10:09:10Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and
Bottom-Up Networks [33.974241749058585]
Multi-person pose estimation can cause human detection to be erroneous and human-joints grouping to be unreliable.
Existing top-down methods rely on human detection and thus suffer from these problems.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2021-04-05T07:05:21Z) - Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views [22.86745487695168]
We propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras.
Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views.
arXiv Detail & Related papers (2021-01-24T16:28:10Z) - PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation [35.791868530073955]
We present PandaNet, a new single-shot, anchor-based and multi-person 3D pose estimation approach.
The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass.
It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box.
arXiv Detail & Related papers (2021-01-07T10:32:17Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization [83.57863764231655]
We propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization.
A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints.
We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets.
arXiv Detail & Related papers (2020-07-17T12:44:23Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.