Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic
Projection
- URL: http://arxiv.org/abs/2207.10955v1
- Date: Fri, 22 Jul 2022 09:10:01 GMT
- Title: Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic
Projection
- Authors: Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, Yizhou Wang
- Abstract summary: Voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras.
We present Faster VoxelPose to address the challenge by re-projecting the feature volume to the three two-dimensional coordinate planes.
Method is free from costly 3D-CNNs and improves the speed of VoxelPose by ten times.
- Score: 24.964926464973026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the voxel-based methods have achieved promising results for
multi-person 3D pose estimation from multi-cameras, they suffer from heavy
computation burdens, especially for large scenes. We present Faster VoxelPose
to address the challenge by re-projecting the feature volume to the three
two-dimensional coordinate planes and estimating X, Y, Z coordinates from them
separately. To that end, we first localize each person by a 3D bounding box by
estimating a 2D box and its height based on the volume features projected to
the xy-plane and z-axis, respectively. Then for each person, we estimate
partial joint coordinates from the three coordinate planes separately which are
then fused to obtain the final 3D pose. The method is free from costly 3D-CNNs
and improves the speed of VoxelPose by ten times and meanwhile achieves
competitive accuracy as the state-of-the-art methods, proving its potential in
real-time applications.
Related papers
- NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized
Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.
We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels.
We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z) - Neural Voting Field for Camera-Space 3D Hand Pose Estimation [106.34750803910714]
We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
We propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum.
arXiv Detail & Related papers (2023-05-07T16:51:34Z) - SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth
Sampling [75.957103837167]
Reconstructing a 3D shape based on a single sketch image is challenging due to the large domain gap between a sparse, irregular sketch and a regular, dense 3D shape.
Existing works try to employ the global feature extracted from sketch to directly predict the 3D coordinates, but they usually suffer from losing fine details that are not faithful to the input sketch.
arXiv Detail & Related papers (2022-08-14T16:37:51Z) - SPGNet: Spatial Projection Guided 3D Human Pose Estimation in Low
Dimensional Space [14.81199315166042]
We propose a method for 3D human pose estimation that mixes multi-dimensional re-projection into supervised learning.
Based on the estimation results for the dataset Human3.6M, our approach outperforms many state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-06-04T00:51:00Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Light3DPose: Real-time Multi-Person 3D PoseEstimation from Multiple
Views [5.510992382274774]
We present an approach to perform 3D pose estimation of multiple people from a few calibrated camera views.
Our architecture aggregates feature-maps from a 2D pose estimator backbone into a comprehensive representation of the 3D scene.
The proposed method is inherently efficient: as a pure bottom-up approach, it is computationally independent of the number of people in the scene.
arXiv Detail & Related papers (2020-04-06T14:12:19Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.