Related papers: Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks

Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks

URL: http://arxiv.org/abs/2312.01561v1
Date: Mon, 4 Dec 2023 01:28:38 GMT
Title: Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks
Authors: Yan Xu, Kris Kitani
Abstract summary: Cross-view person matching and 3D human pose estimation in multi-camera networks are difficult when the cameras are extrinsically uncalibrated. Existing efforts require large amounts of 3D data for training neural networks or known camera poses for geometric constraints to solve the problem. We present a method, PME, that solves the two tasks without requiring either information.
Score: 36.49915280876899
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-view person matching and 3D human pose estimation in multi-camera networks are particularly difficult when the cameras are extrinsically uncalibrated. Existing efforts generally require large amounts of 3D data for training neural networks or known camera poses for geometric constraints to solve the problem. However, camera poses and 3D data annotation are usually expensive and not always available. We present a method, PME, that solves the two tasks without requiring either information. Our idea is to address cross-view person matching as a clustering problem using each person as a cluster center, then obtain correspondences from person matches, and estimate 3D human poses through multi-view triangulation and bundle adjustment. We solve the clustering problem by introducing a "size constraint" using the number of cameras and a "source constraint" using the fact that two people from the same camera view should not match, to narrow the solution space to a small feasible region. The 2D human poses used in clustering are obtained through a pre-trained 2D pose detector, so our method does not require expensive 3D training data for each new scene. We extensively evaluate our method on three open datasets and two indoor and outdoor datasets collected using arbitrarily set cameras. Our method outperforms other methods by a large margin on cross-view person matching, reaches SOTA performance on 3D human pose estimation without using either camera poses or 3D training data, and shows good generalization ability across five datasets of various environment settings.

Related papers

Self-learning Canonical Space for Multi-view 3D Human Pose Estimation [57.969696744428475]
Multi-view 3D human pose estimation is naturally superior to single view one. The accurate annotation of these information is hard to obtain. We propose a fully self-supervised framework, named cascaded multi-view aggregating network (CMANet) CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis.
arXiv Detail & Related papers (2024-03-19T04:54:59Z)
Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z)
Multi-person 3D pose estimation from unlabelled data [2.54990557236581]
We present a model based on Graph Neural Networks capable of predicting the cross-view correspondence of the people in the scenario. We also present a Multilayer Perceptron that takes the 2D points to yield the 3D poses of each person.
arXiv Detail & Related papers (2022-12-16T22:03:37Z)
MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z)
VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines. It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment. It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z)
TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation from Video [23.00696619207748]
Estimating 3D human poses from video is a challenging problem. The lack of 3D human pose annotations is a major obstacle for supervised training and for generalization to unseen datasets. We propose a weakly-supervised training scheme that does not require 3D annotations or calibrated cameras.
arXiv Detail & Related papers (2021-05-14T00:46:48Z)
Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views. We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views [22.86745487695168]
We propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras. Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views.
arXiv Detail & Related papers (2021-01-24T16:28:10Z)
CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild [31.334715988245748]
We propose a self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data. In contrast to most existing methods, we do not require calibrated cameras and can therefore learn from moving cameras. Key to the success are new, unbiased reconstruction objectives that mix information across views and training samples.
arXiv Detail & Related papers (2020-11-30T10:42:27Z)
VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views. We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space. We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.