VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild
- URL: http://arxiv.org/abs/2108.02452v1
- Date: Thu, 5 Aug 2021 08:35:44 GMT
- Title: VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild
- Authors: Yifu Zhang and Chunyu Wang and Xinggang Wang and Wenyu Liu and Wenjun
Zeng
- Abstract summary: We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
- Score: 98.69191256693703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present VoxelTrack for multi-person 3D pose estimation and tracking from a
few cameras which are separated by wide baselines. It employs a multi-branch
network to jointly estimate 3D poses and re-identification (Re-ID) features for
all people in the environment. In contrast to previous efforts which require to
establish cross-view correspondence based on noisy 2D pose estimates, it
directly estimates and tracks 3D poses from a 3D voxel-based representation
constructed from multi-view images. We first discretize the 3D space by regular
voxels and compute a feature vector for each voxel by averaging the body joint
heatmaps that are inversely projected from all views. We estimate 3D poses from
the voxel representation by predicting whether each voxel contains a particular
body joint. Similarly, a Re-ID feature is computed for each voxel which is used
to track the estimated 3D poses over time. The main advantage of the approach
is that it avoids making any hard decisions based on individual images. The
approach can robustly estimate and track 3D poses even when people are severely
occluded in some cameras. It outperforms the state-of-the-art methods by a
large margin on three public datasets including Shelf, Campus and CMU Panoptic.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Self-learning Canonical Space for Multi-view 3D Human Pose Estimation [57.969696744428475]
Multi-view 3D human pose estimation is naturally superior to single view one.
The accurate annotation of these information is hard to obtain.
We propose a fully self-supervised framework, named cascaded multi-view aggregating network (CMANet)
CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis.
arXiv Detail & Related papers (2024-03-19T04:54:59Z) - Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to
Smart Edge Sensors [28.502280038100167]
2D joint detection for each camera view is performed locally on a dedicated embedded inference processor.
3D poses are recovered from 2D joints on a central backend, based on triangulation and a body model.
The whole pipeline is capable of real-time operation.
arXiv Detail & Related papers (2021-06-28T14:00:00Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation [35.791868530073955]
We present PandaNet, a new single-shot, anchor-based and multi-person 3D pose estimation approach.
The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass.
It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box.
arXiv Detail & Related papers (2021-01-07T10:32:17Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.