PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds
- URL: http://arxiv.org/abs/2308.14492v1
- Date: Mon, 28 Aug 2023 11:10:14 GMT
- Title: PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds
- Authors: Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan
Zhang, Chen Change Loy, Lei Yang, Ziwei Liu
- Abstract summary: We propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings.
PointHPS iteratively refines point features through a cascaded architecture.
Extensive experiments demonstrate that PointHPS, with its powerful point feature extraction and processing scheme, outperforms State-of-the-Art methods.
- Score: 99.60575439926963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human pose and shape estimation (HPS) has attracted increasing attention in
recent years. While most existing studies focus on HPS from 2D images or videos
with inherent depth ambiguity, there are surging need to investigate HPS from
3D point clouds as depth sensors have been frequently employed in commercial
devices. However, real-world sensory 3D points are usually noisy and
incomplete, and also human bodies could have different poses of high diversity.
To tackle these challenges, we propose a principled framework, PointHPS, for
accurate 3D HPS from point clouds captured in real-world settings, which
iteratively refines point features through a cascaded architecture.
Specifically, each stage of PointHPS performs a series of downsampling and
upsampling operations to extract and collate both local and global cues, which
are further enhanced by two novel modules: 1) Cross-stage Feature Fusion (CFF)
for multi-scale feature propagation that allows information to flow effectively
through the stages, and 2) Intermediate Feature Enhancement (IFE) for
body-aware feature aggregation that improves feature quality after each stage.
To facilitate a comprehensive study under various scenarios, we conduct our
experiments on two large-scale benchmarks, comprising i) a dataset that
features diverse subjects and actions captured by real commercial sensors in a
laboratory environment, and ii) controlled synthetic data generated with
realistic considerations such as clothed humans in crowded outdoor scenes.
Extensive experiments demonstrate that PointHPS, with its powerful point
feature extraction and processing scheme, outperforms State-of-the-Art methods
by significant margins across the board. Homepage:
https://caizhongang.github.io/projects/PointHPS/.
Related papers
- Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative
Convolution Network [80.19054069988559]
We find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency.
We propose a new Direction-aware Cumulative Convolution Network (DaCCN), which improves the depth representation in two aspects.
Experiments show that our method achieves significant improvements on three widely used benchmarks.
arXiv Detail & Related papers (2023-08-10T14:32:18Z) - Surface-biased Multi-Level Context 3D Object Detection [1.9723551683930771]
This work addresses the object detection task in 3D point clouds using a highly efficient, surface-biased, feature extraction method (wang2022rbgnet)
We propose a 3D object detector that extracts accurate feature representations of object candidates and leverages self-attention on point patches, object candidates, and on the global scene in 3D scene.
arXiv Detail & Related papers (2023-02-13T11:50:04Z) - PIDS: Joint Point Interaction-Dimension Search for 3D Point Cloud [36.55716011085907]
PIDS is a novel paradigm to jointly explore point interactions and point dimensions to serve semantic segmentation on point cloud data.
We establish a large search space to jointly consider versatile point interactions and point dimensions.
We improve the search space exploration by leveraging predictor-based Neural Architecture Search (NAS) and enhance the quality of prediction.
arXiv Detail & Related papers (2022-11-28T20:35:22Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape
Recognition [38.540048855119004]
We propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification.
The core component of LATFormer is a module named Locality-Aware Fusion (LAF) which integrates the local features of correlated regions across the two modalities.
In our LATFormer, we utilize the LAF module to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically to obtain more informative features.
arXiv Detail & Related papers (2021-09-03T03:23:27Z) - Semantic Segmentation for Real Point Cloud Scenes via Bilateral
Augmentation and Adaptive Fusion [38.05362492645094]
Real point cloud scenes can intuitively capture complex surroundings in the real world, but due to 3D data's raw nature, it is very challenging for machine perception.
We concentrate on the essential visual task, semantic segmentation, for large-scale point cloud data collected in reality.
By comparing with state-of-the-art networks on three different benchmarks, we demonstrate the effectiveness of our network.
arXiv Detail & Related papers (2021-03-12T04:13:20Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.