PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation
- URL: http://arxiv.org/abs/2101.02471v1
- Date: Thu, 7 Jan 2021 10:32:17 GMT
- Title: PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation
- Authors: Abdallah Benzine, Florian Chabot, Bertrand Luvison, Quoc Cong Pham,
Cahterine Achrd
- Abstract summary: We present PandaNet, a new single-shot, anchor-based and multi-person 3D pose estimation approach.
The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass.
It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box.
- Score: 35.791868530073955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, several deep learning models have been proposed for 3D human pose
estimation. Nevertheless, most of these approaches only focus on the
single-person case or estimate 3D pose of a few people at high resolution.
Furthermore, many applications such as autonomous driving or crowd analysis
require pose estimation of a large number of people possibly at low-resolution.
In this work, we present PandaNet (Pose estimAtioN and Dectection Anchor-based
Network), a new single-shot, anchor-based and multi-person 3D pose estimation
approach. The proposed model performs bounding box detection and, for each
detected person, 2D and 3D pose regression into a single forward pass. It does
not need any post-processing to regroup joints since the network predicts a
full 3D pose for each bounding box and allows the pose estimation of a possibly
large number of people at low resolution. To manage people overlapping, we
introduce a Pose-Aware Anchor Selection strategy. Moreover, as imbalance exists
between different people sizes in the image, and joints coordinates have
different uncertainties depending on these sizes, we propose a method to
automatically optimize weights associated to different people scales and joints
for efficient training. PandaNet surpasses previous single-shot methods on
several challenging datasets: a multi-person urban virtual but very realistic
dataset (JTA Dataset), and two real world 3D multi-person datasets (CMU
Panoptic and MuPoTS-3D).
Related papers
- Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation.
Existing top-down and bottom-up approaches to pose estimation suffer from detection errors.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z) - Permutation-Invariant Relational Network for Multi-person 3D Pose
Estimation [46.38290735670527]
Recovering multi-person 3D poses from a single RGB image is a severely ill-conditioned problem.
Recent works have shown promising results by simultaneously reasoning for different people but in all cases within a local neighborhood.
PI-Net introduces a self-attention block to reason for all people in the image at the same time and refine potentially noisy initial 3D poses.
In this paper, we model people interactions at a whole, independently of their number, and in a permutation-invariant manner building upon the Set Transformer.
arXiv Detail & Related papers (2022-04-11T07:23:54Z) - Shape-aware Multi-Person Pose Estimation from Multi-View Images [47.13919147134315]
Our proposed coarse-to-fine pipeline first aggregates noisy 2D observations from multiple camera views into 3D space.
The final pose estimates are attained from a novel optimization scheme which links high-confidence multi-view 2D observations and 3D joint candidates.
arXiv Detail & Related papers (2021-10-05T20:04:21Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.