Related papers: Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS

Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS

URL: http://arxiv.org/abs/2003.03972v3
Date: Thu, 29 Jul 2021 03:02:33 GMT
Title: Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS
Authors: Long Chen, Haizhou Ai, Rui Chen, Zijie Zhuang, Shuang Liu
Abstract summary: We present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. We propose a new large-scale multi-human dataset with 12 to 28 camera views.
Score: 13.191601826570786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. Unlike previous methods that associate 2D poses among all pairs of views from scratch at every frame, we exploit the temporal consistency in videos to match the 2D inputs with 3D poses directly in 3-space. More specifically, we propose to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking. This novel formulation improves both accuracy and efficiency, as we demonstrated on widely-used public datasets. To further verify the scalability of our method, we propose a new large-scale multi-human dataset with 12 to 28 camera views. Without bells and whistles, our solution achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. The proposed dataset is released at https://github.com/longcw/crossview_3d_pose_tracking.

Related papers

CoMotion: Concurrent Multi-person 3D Motion [88.27833466761234]
We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy.
arXiv Detail & Related papers (2025-04-16T15:40:15Z)
MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network. Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z)
Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks [36.49915280876899]
Cross-view person matching and 3D human pose estimation in multi-camera networks are difficult when the cameras are extrinsically uncalibrated. Existing efforts require large amounts of 3D data for training neural networks or known camera poses for geometric constraints to solve the problem. We present a method, PME, that solves the two tasks without requiring either information.
arXiv Detail & Related papers (2023-12-04T01:28:38Z)
Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z)
3D Human Pose Estimation in Multi-View Operating Room Videos Using Differentiable Camera Projections [2.486571221735935]
We propose to directly optimise for localisation in 3D by training 2D CNNs end-to-end based on a 3D loss. Using videos from the MVOR dataset, we show that this end-to-end approach outperforms optimisation in 2D space.
arXiv Detail & Related papers (2022-10-21T09:00:02Z)
MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z)
VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines. It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment. It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z)
Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views. We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views [22.86745487695168]
We propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras. Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views.
arXiv Detail & Related papers (2021-01-24T16:28:10Z)
VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views. We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space. We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.