Efficient Human Pose Estimation via 3D Event Point Cloud
- URL: http://arxiv.org/abs/2206.04511v1
- Date: Thu, 9 Jun 2022 13:50:20 GMT
- Title: Efficient Human Pose Estimation via 3D Event Point Cloud
- Authors: Jiaan Chen, Hao Shi, Yaozu Ye, Kailun Yang, Lei Sun, Kaiwei Wang
- Abstract summary: We are the first to estimate 2D human pose directly from 3D event point cloud.
We propose a novel representation of events, the NXized event point cloud, aggregating events on the same position of a small time slice.
We find that our method achieves PointNet with 2048 points input 82.46mm in MPJPE3D on the DHP19 dataset, while only has a latency of 12.29ms.
- Score: 10.628192454401553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human Pose Estimation (HPE) based on RGB images has experienced a rapid
development benefiting from deep learning. However, event-based HPE has not
been fully studied, which remains great potential for applications in extreme
scenes and efficiency-critical conditions. In this paper, we are the first to
estimate 2D human pose directly from 3D event point cloud. We propose a novel
representation of events, the rasterized event point cloud, aggregating events
on the same position of a small time slice. It maintains the 3D features from
multiple statistical cues and significantly reduces memory consumption and
computation complexity, proved to be efficient in our work. We then leverage
the rasterized event point cloud as input to three different backbones,
PointNet, DGCNN, and Point Transformer, with two linear layer decoders to
predict the location of human keypoints. We find that based on our method,
PointNet achieves promising results with much faster speed, whereas Point
Transfomer reaches much higher accuracy, even close to previous
event-frame-based methods. A comprehensive set of results demonstrates that our
proposed method is consistently effective for these 3D backbone models in
event-driven human pose estimation. Our method based on PointNet with 2048
points input achieves 82.46mm in MPJPE3D on the DHP19 dataset, while only has a
latency of 12.29ms on an NVIDIA Jetson Xavier NX edge computing platform, which
is ideally suitable for real-time detection with event cameras. Code will be
made publicly at https://github.com/MasterHow/EventPointPose.
Related papers
- SPiKE: 3D Human Pose from Point Cloud Sequences [1.8024397171920885]
3D Human Pose Estimation (HPE) is the task of locating keypoints of the human body in 3D space from 2D or 3D representations such as RGB images, depth maps or point clouds.
This paper presents SPiKE, a novel approach to 3D HPE using point cloud sequences.
Experiments on the ITOP benchmark for 3D HPE show that SPiKE reaches 89.19% mAP, achieving state-of-the-art performance with significantly lower inference times.
arXiv Detail & Related papers (2024-09-03T13:22:01Z) - Improving 3D Pose Estimation for Sign Language [38.20064386142944]
This work addresses 3D human pose reconstruction in single images.
We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose.
arXiv Detail & Related papers (2023-08-18T13:05:10Z) - DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point
Clouds [42.64433313672884]
We regularize raw points to a dense format by storing 3D coordinates in 2D grids.
Unlike the sampling operation commonly used in existing works, the dense 2D representation preserves most points.
We also present a novel warping projection technique to alleviate the information loss problem.
arXiv Detail & Related papers (2023-08-08T16:37:24Z) - SNAKE: Shape-aware Neural 3D Keypoint Field [62.91169625183118]
Detecting 3D keypoints from point clouds is important for shape reconstruction.
This work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?
We propose a novel unsupervised paradigm named SNAKE, which is short for shape-aware neural 3D keypoint field.
arXiv Detail & Related papers (2022-06-03T17:58:43Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic
Voxelization [0.0]
We propose a novel two-stage framework for the efficient 3D point cloud object detection.
We parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy.
We highlight our KITTI 3D object detection dataset with 75 FPS and on Open dataset with 25 FPS inference speed with satisfactory accuracy.
arXiv Detail & Related papers (2021-07-27T10:07:39Z) - Learning Semantic Segmentation of Large-Scale Point Clouds with Random
Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds.
The key to our approach is to use random point sampling instead of more complex point selection approaches.
Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion
Forecasting with a Single Convolutional Net [93.51773847125014]
We propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor.
Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world.
arXiv Detail & Related papers (2020-12-22T22:43:35Z) - HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization [83.57863764231655]
We propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization.
A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints.
We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets.
arXiv Detail & Related papers (2020-07-17T12:44:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.