Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo
- URL: http://arxiv.org/abs/2104.02273v1
- Date: Tue, 6 Apr 2021 03:49:35 GMT
- Title: Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo
- Authors: Jiahao Lin, Gim Hee Lee
- Abstract summary: Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
- Score: 71.59494156155309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing approaches for multi-view multi-person 3D pose estimation explicitly
establish cross-view correspondences to group 2D pose detections from multiple
camera views and solve for the 3D pose estimation for each person. Establishing
cross-view correspondences is challenging in multi-person scenes, and incorrect
correspondences will lead to sub-optimal performance for the multi-stage
pipeline. In this work, we present our multi-view 3D pose estimation approach
based on plane sweep stereo to jointly address the cross-view fusion and 3D
pose reconstruction in a single shot. Specifically, we propose to perform depth
regression for each joint of each 2D pose in a target camera view. Cross-view
consistency constraints are implicitly enforced by multiple reference camera
views via the plane sweep algorithm to facilitate accurate depth regression. We
adopt a coarse-to-fine scheme to first regress the person-level depth followed
by a per-person joint-level relative depth estimation. 3D poses are obtained
from a simple back-projection given the estimated depths. We evaluate our
approach on benchmark datasets where it outperforms previous state-of-the-arts
while being remarkably efficient. Our code is available at
https://github.com/jiahaoLjh/PlaneSweepPose.
Related papers
- Self-learning Canonical Space for Multi-view 3D Human Pose Estimation [57.969696744428475]
Multi-view 3D human pose estimation is naturally superior to single view one.
The accurate annotation of these information is hard to obtain.
We propose a fully self-supervised framework, named cascaded multi-view aggregating network (CMANet)
CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis.
arXiv Detail & Related papers (2024-03-19T04:54:59Z) - Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth
Cameras [36.59439020480503]
We tackle the task of multi-view, multi-person 3D human pose estimation from a limited number of uncalibrated depth cameras.
We propose to leverage sparse, uncalibrated depth cameras providing RGBD video streams for 3D human pose estimation.
arXiv Detail & Related papers (2024-01-28T10:06:17Z) - Geometry-Biased Transformer for Robust Multi-View 3D Human Pose
Reconstruction [3.069335774032178]
We propose a novel encoder-decoder Transformer architecture to estimate 3D poses from multi-view 2D pose sequences.
We conduct experiments on three benchmark public datasets, Human3.6M, CMU Panoptic and Occlusion-Persons.
arXiv Detail & Related papers (2023-12-28T16:30:05Z) - DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.
We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.
Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z) - Direct Multi-view Multi-person 3D Pose Estimation [138.48139701871213]
We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images.
MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
arXiv Detail & Related papers (2021-11-07T13:09:20Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.