Related papers: Direct Dense Pose Estimation

Direct Dense Pose Estimation

URL: http://arxiv.org/abs/2204.01263v1
Date: Mon, 4 Apr 2022 06:14:38 GMT
Title: Direct Dense Pose Estimation
Authors: Liqian Ma, Lingjie Liu, Christian Theobalt, Luc Van Gool
Abstract summary: Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies. Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person. We propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP)
Score: 138.56533828316833
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies, which finds various applications, such as human body reconstruction, human pose transfer, and human action recognition. Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person and matching dense correspondences in each bounding box. Consequently, these methods lack robustness due to their critical dependence on the Mask R-CNN detection, and the runtime increases drastically as the number of persons in the image increases. We therefore propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP). DDP first predicts the instance mask and global IUV representation separately and then combines them together. We also propose a simple yet effective 2D temporal-smoothing scheme to alleviate the temporal jitters when dealing with video data. Experiments demonstrate that DDP overcomes the limitations of previous top-down baseline methods and achieves competitive accuracy. In addition, DDP is computationally more efficient than previous dense pose estimation methods, and it reduces jitters when applied to a video sequence, which is a problem plaguing the previous methods.

Related papers

Polar Coordinate-Based 2D Pose Prior with Neural Distance Field [0.34952465649465553]
We propose a 2D pose prior-guided refinement approach based on Neural Distance Fields (NDF)<n>We introduce a polar coordinate-based representation that explicitly incorporates joint connection lengths, enabling a more accurate correction of erroneous pose estimations.<n>Our method is evaluated on a long jump dataset, demonstrating its ability to improve 2D pose estimation across multiple pose representations.
arXiv Detail & Related papers (2025-05-06T11:31:14Z)
SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network. It can perform inference at 32 FPS without requiring inputs other than the RGB image. It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z)
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses. We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass. Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z)
ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding [35.49066795648395]
In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion. We introduce ORTexME, an occlusion-robust temporal method that utilizes temporal information from the input video to better regularize the occluded body parts. Our method achieves significant improvement on the challenging multi-person 3DPW dataset, where our method achieves 1.8 P-MPJPE error reduction.
arXiv Detail & Related papers (2023-09-21T15:50:04Z)
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z)
Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation [33.86986028882488]
Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders. Existing methods try to handle occlusion with pose priors/constraints, data augmentation, or implicit reasoning. We develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation.
arXiv Detail & Related papers (2022-07-29T22:12:50Z)
Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation. Existing top-down and bottom-up approaches to pose estimation suffer from detection errors. We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z)
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [78.83305967085413]
This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task. Our method outperforms state-of-the-art methods with fewer parameters and less computational overhead.
arXiv Detail & Related papers (2022-03-15T04:00:59Z)
Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction [94.25865526414717]
This paper considers a new problem of adapting a pre-trained model of human mesh reconstruction to out-of-domain streaming videos. We propose Bilevel Online Adaptation, which divides the optimization process of overall multi-objective into two steps of weight probe and weight update in a training. We demonstrate that BOA leads to state-of-the-art results on two human mesh reconstruction benchmarks.
arXiv Detail & Related papers (2021-03-30T15:47:58Z)
Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)
Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image. A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently. Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.