Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth
Cameras
- URL: http://arxiv.org/abs/2401.15616v1
- Date: Sun, 28 Jan 2024 10:06:17 GMT
- Title: Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth
Cameras
- Authors: Yu-Jhe Li, Yan Xu, Rawal Khirodkar, Jinhyung Park, Kris Kitani
- Abstract summary: We tackle the task of multi-view, multi-person 3D human pose estimation from a limited number of uncalibrated depth cameras.
We propose to leverage sparse, uncalibrated depth cameras providing RGBD video streams for 3D human pose estimation.
- Score: 36.59439020480503
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We tackle the task of multi-view, multi-person 3D human pose estimation from
a limited number of uncalibrated depth cameras. Recently, many approaches have
been proposed for 3D human pose estimation from multi-view RGB cameras.
However, these works (1) assume the number of RGB camera views is large enough
for 3D reconstruction, (2) the cameras are calibrated, and (3) rely on ground
truth 3D poses for training their regression model. In this work, we propose to
leverage sparse, uncalibrated depth cameras providing RGBD video streams for 3D
human pose estimation. We present a simple pipeline for Multi-View Depth Human
Pose Estimation (MVD-HPE) for jointly predicting the camera poses and 3D human
poses without training a deep 3D human pose regression model. This framework
utilizes 3D Re-ID appearance features from RGBD images to formulate more
accurate correspondences (for deriving camera positions) compared to using
RGB-only features. We further propose (1) depth-guided camera-pose estimation
by leveraging 3D rigid transformations as guidance and (2) depth-constrained 3D
human pose estimation by utilizing depth-projected 3D points as an alternative
objective for optimization. In order to evaluate our proposed pipeline, we
collect three video sets of RGBD videos recorded from multiple sparse-view
depth cameras and ground truth 3D poses are manually annotated. Experiments
show that our proposed method outperforms the current 3D human pose
regression-free pipelines in terms of both camera pose estimation and 3D human
pose estimation.
Related papers
- EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans [5.047302480095444]
Monocular Human Pose Estimation aims at determining the 3D positions of human joints from a single 2D image captured by a camera.
In this study, instead of relying on approximations, we advocate for utilizing the full perspective camera model.
We introduce the EPOCH framework, comprising two main components: the pose lifter network (LiftNet) and the pose regressor network (RegNet)
arXiv Detail & Related papers (2024-06-28T08:16:54Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - CameraPose: Weakly-Supervised Monocular 3D Human Pose Estimation by
Leveraging In-the-wild 2D Annotations [25.05308239278207]
We present CameraPose, a weakly-supervised framework for 3D human pose estimation from a single image.
By adding a camera parameter branch, any in-the-wild 2D annotations can be fed into our pipeline to boost the training diversity.
We also introduce a refinement network module with confidence-guided loss to further improve the quality of noisy 2D keypoints extracted by 2D pose estimators.
arXiv Detail & Related papers (2023-01-08T05:07:41Z) - SPEC: Seeing People in the Wild with an Estimated Camera [64.85791231401684]
We introduce SPEC, the first in-the-wild 3D HPS method that estimates the perspective camera from a single image.
We train a neural network to estimate the field of view, camera pitch, and roll an input image.
We then train a novel network that rolls the camera calibration to the image features and uses these together to regress 3D body shape and pose.
arXiv Detail & Related papers (2021-10-01T19:05:18Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation
from Video [23.00696619207748]
Estimating 3D human poses from video is a challenging problem.
The lack of 3D human pose annotations is a major obstacle for supervised training and for generalization to unseen datasets.
We propose a weakly-supervised training scheme that does not require 3D annotations or calibrated cameras.
arXiv Detail & Related papers (2021-05-14T00:46:48Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose
Estimation [18.103595280706593]
We leverage recent advances in reliable 2D pose estimation with CNN to estimate the 3D pose of people from depth images.
Our approach achieves very competitive results both in accuracy and speed on two public datasets.
arXiv Detail & Related papers (2020-11-10T10:08:13Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.