Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2208.00090v1
- Date: Fri, 29 Jul 2022 22:12:50 GMT
- Title: Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
- Authors: Qihao Liu, Yi Zhang, Song Bai, Alan Yuille
- Abstract summary: Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders.
Existing methods try to handle occlusion with pose priors/constraints, data augmentation, or implicit reasoning.
We develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation.
- Score: 33.86986028882488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Occlusion poses a great threat to monocular multi-person 3D human pose
estimation due to large variability in terms of the shape, appearance, and
position of occluders. While existing methods try to handle occlusion with pose
priors/constraints, data augmentation, or implicit reasoning, they still fail
to generalize to unseen poses or occlusion cases and may make large mistakes
when multiple people are present. Inspired by the remarkable ability of humans
to infer occluded joints from visible cues, we develop a method to explicitly
model this process that significantly improves bottom-up multi-person human
pose estimation with or without occlusions. First, we split the task into two
subtasks: visible keypoints detection and occluded keypoints reasoning, and
propose a Deeply Supervised Encoder Distillation (DSED) network to solve the
second one. To train our model, we propose a Skeleton-guided human Shape
Fitting (SSF) approach to generate pseudo occlusion labels on the existing
datasets, enabling explicit occlusion reasoning. Experiments show that
explicitly learning from occlusions improves human pose estimation. In
addition, exploiting feature-level information of visible joints allows us to
reason about occluded joints more accurately. Our method outperforms both the
state-of-the-art top-down and bottom-up methods on several benchmarks.
Related papers
- DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery.
It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z) - A comprehensive framework for occluded human pose estimation [10.92234109536279]
Occlusion presents a significant challenge in human pose estimation.
We propose DAG (Data, Attention, Graph) to address the performance degradation caused by occluded human pose estimation.
We also present the Feature-Guided Multi-Hop GCN (FGMP-GCN) to fully explore the prior knowledge of body structure and improve pose estimation results.
arXiv Detail & Related papers (2023-12-30T06:55:30Z) - Learning Visibility for Robust Dense Human Body Estimation [78.37389398573882]
Estimating 3D human pose and shape from 2D images is a crucial yet challenging task.
We learn dense human body estimation that is robust to partial observations.
We obtain pseudo ground-truths of visibility labels from dense UV correspondences and train a neural network to predict visibility along with 3D coordinates.
arXiv Detail & Related papers (2022-08-23T00:01:05Z) - Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation.
Existing top-down and bottom-up approaches to pose estimation suffer from detection errors.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and
Bottom-Up Networks [33.974241749058585]
Multi-person pose estimation can cause human detection to be erroneous and human-joints grouping to be unreliable.
Existing top-down methods rely on human detection and thus suffer from these problems.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2021-04-05T07:05:21Z) - AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in
the Wild [77.43884383743872]
We present AdaFuse, an adaptive multiview fusion method to enhance the features in occluded views.
We extensively evaluate the approach on three public datasets including Human3.6M, Total Capture and CMU Panoptic.
We also create a large scale synthetic dataset Occlusion-Person, which allows us to perform numerical evaluation on the occluded joints.
arXiv Detail & Related papers (2020-10-26T03:19:46Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.