Direct Multi-view Multi-person 3D Pose Estimation
- URL: http://arxiv.org/abs/2111.04076v1
- Date: Sun, 7 Nov 2021 13:09:20 GMT
- Title: Direct Multi-view Multi-person 3D Pose Estimation
- Authors: Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng
- Abstract summary: We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images.
MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
- Score: 138.48139701871213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Multi-view Pose transformer (MvP) for estimating multi-person 3D
poses from multi-view images. Instead of estimating 3D joint locations from
costly volumetric representation or reconstructing the per-person 3D pose from
multiple detected 2D poses as in previous methods, MvP directly regresses the
multi-person 3D poses in a clean and efficient way, without relying on
intermediate tasks. Specifically, MvP represents skeleton joints as learnable
query embeddings and let them progressively attend to and reason over the
multi-view information from the input images to directly regress the actual 3D
joint locations. To improve the accuracy of such a simple pipeline, MvP
presents a hierarchical scheme to concisely represent query embeddings of
multi-person skeleton joints and introduces an input-dependent query adaptation
approach. Further, MvP designs a novel geometrically guided attention
mechanism, called projective attention, to more precisely fuse the cross-view
information for each joint. MvP also introduces a RayConv operation to
integrate the view-dependent camera geometry into the feature representations
for augmenting the projective attention. We show experimentally that our MvP
model outperforms the state-of-the-art methods on several benchmarks while
being much more efficient. Notably, it achieves 92.3% AP25 on the challenging
Panoptic dataset, improving upon the previous best approach [36] by 9.8%. MvP
is general and also extendable to recovering human mesh represented by the SMPL
model, thus useful for modeling multi-person body shapes. Code and models are
available at https://github.com/sail-sg/mvp.
Related papers
- PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images.
PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z) - AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose
Regression [66.39539141222524]
We propose to represent the human parts as adaptive points and introduce a fine-grained body representation method.
With the proposed body representation, we deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose.
We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose.
arXiv Detail & Related papers (2022-10-08T12:54:20Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Multi-View Matching (MVM): Facilitating Multi-Person 3D Pose Estimation
Learning with Action-Frozen People Video [38.63662549684785]
MVM method generates reliable 3D human poses from a large-scale video dataset.
We train a neural network that takes a single image as the input for multi-person 3D pose estimation.
arXiv Detail & Related papers (2020-04-11T01:09:50Z) - Light3DPose: Real-time Multi-Person 3D PoseEstimation from Multiple
Views [5.510992382274774]
We present an approach to perform 3D pose estimation of multiple people from a few calibrated camera views.
Our architecture aggregates feature-maps from a 2D pose estimator backbone into a comprehensive representation of the 3D scene.
The proposed method is inherently efficient: as a pure bottom-up approach, it is computationally independent of the number of people in the scene.
arXiv Detail & Related papers (2020-04-06T14:12:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.