Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS
- URL: http://arxiv.org/abs/2003.03972v3
- Date: Thu, 29 Jul 2021 03:02:33 GMT
- Title: Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS
- Authors: Long Chen, Haizhou Ai, Rui Chen, Zijie Zhuang, Shuang Liu
- Abstract summary: We present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views.
It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate.
We propose a new large-scale multi-human dataset with 12 to 28 camera views.
- Score: 13.191601826570786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating 3D poses of multiple humans in real-time is a classic but still
challenging task in computer vision. Its major difficulty lies in the ambiguity
in cross-view association of 2D poses and the huge state space when there are
multiple people in multiple views. In this paper, we present a novel solution
for multi-human 3D pose estimation from multiple calibrated camera views. It
takes 2D poses in different camera coordinates as inputs and aims for the
accurate 3D poses in the global coordinate. Unlike previous methods that
associate 2D poses among all pairs of views from scratch at every frame, we
exploit the temporal consistency in videos to match the 2D inputs with 3D poses
directly in 3-space. More specifically, we propose to retain the 3D pose for
each person and update them iteratively via the cross-view multi-human
tracking. This novel formulation improves both accuracy and efficiency, as we
demonstrated on widely-used public datasets. To further verify the scalability
of our method, we propose a new large-scale multi-human dataset with 12 to 28
camera views. Without bells and whistles, our solution achieves 154 FPS on 12
cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale
real-world applications. The proposed dataset is released at
https://github.com/longcw/crossview_3d_pose_tracking.
Related papers
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network.
Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z) - Multi-View Person Matching and 3D Pose Estimation with Arbitrary
Uncalibrated Camera Networks [36.49915280876899]
Cross-view person matching and 3D human pose estimation in multi-camera networks are difficult when the cameras are extrinsically uncalibrated.
Existing efforts require large amounts of 3D data for training neural networks or known camera poses for geometric constraints to solve the problem.
We present a method, PME, that solves the two tasks without requiring either information.
arXiv Detail & Related papers (2023-12-04T01:28:38Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - 3D Human Pose Estimation in Multi-View Operating Room Videos Using
Differentiable Camera Projections [2.486571221735935]
We propose to directly optimise for localisation in 3D by training 2D CNNs end-to-end based on a 3D loss.
Using videos from the MVOR dataset, we show that this end-to-end approach outperforms optimisation in 2D space.
arXiv Detail & Related papers (2022-10-21T09:00:02Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views [22.86745487695168]
We propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras.
Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views.
arXiv Detail & Related papers (2021-01-24T16:28:10Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.