Direct Dense Pose Estimation
- URL: http://arxiv.org/abs/2204.01263v1
- Date: Mon, 4 Apr 2022 06:14:38 GMT
- Title: Direct Dense Pose Estimation
- Authors: Liqian Ma, Lingjie Liu, Christian Theobalt, Luc Van Gool
- Abstract summary: Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies.
Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person.
We propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP)
- Score: 138.56533828316833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dense human pose estimation is the problem of learning dense correspondences
between RGB images and the surfaces of human bodies, which finds various
applications, such as human body reconstruction, human pose transfer, and human
action recognition. Prior dense pose estimation methods are all based on Mask
R-CNN framework and operate in a top-down manner of first attempting to
identify a bounding box for each person and matching dense correspondences in
each bounding box. Consequently, these methods lack robustness due to their
critical dependence on the Mask R-CNN detection, and the runtime increases
drastically as the number of persons in the image increases. We therefore
propose a novel alternative method for solving the dense pose estimation
problem, called Direct Dense Pose (DDP). DDP first predicts the instance mask
and global IUV representation separately and then combines them together. We
also propose a simple yet effective 2D temporal-smoothing scheme to alleviate
the temporal jitters when dealing with video data. Experiments demonstrate that
DDP overcomes the limitations of previous top-down baseline methods and
achieves competitive accuracy. In addition, DDP is computationally more
efficient than previous dense pose estimation methods, and it reduces jitters
when applied to a video sequence, which is a problem plaguing the previous
methods.
Related papers
- SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network.
It can perform inference at 32 FPS without requiring inputs other than the RGB image.
It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z) - DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average
Texture and Mesh Encoding [35.49066795648395]
In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion.
We introduce ORTexME, an occlusion-robust temporal method that utilizes temporal information from the input video to better regularize the occluded body parts.
Our method achieves significant improvement on the challenging multi-person 3DPW dataset, where our method achieves 1.8 P-MPJPE error reduction.
arXiv Detail & Related papers (2023-09-21T15:50:04Z) - Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation [33.86986028882488]
Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders.
Existing methods try to handle occlusion with pose priors/constraints, data augmentation, or implicit reasoning.
We develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation.
arXiv Detail & Related papers (2022-07-29T22:12:50Z) - Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation.
Existing top-down and bottom-up approaches to pose estimation suffer from detection errors.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z) - P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose
Estimation [78.83305967085413]
This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task.
Our method outperforms state-of-the-art methods with fewer parameters and less computational overhead.
arXiv Detail & Related papers (2022-03-15T04:00:59Z) - Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction [94.25865526414717]
This paper considers a new problem of adapting a pre-trained model of human mesh reconstruction to out-of-domain streaming videos.
We propose Bilevel Online Adaptation, which divides the optimization process of overall multi-objective into two steps of weight probe and weight update in a training.
We demonstrate that BOA leads to state-of-the-art results on two human mesh reconstruction benchmarks.
arXiv Detail & Related papers (2021-03-30T15:47:58Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.