Real-time 3D human action recognition based on Hyperpoint sequence
- URL: http://arxiv.org/abs/2111.08492v3
- Date: Mon, 26 Feb 2024 08:48:08 GMT
- Title: Real-time 3D human action recognition based on Hyperpoint sequence
- Authors: Xing Li, Qian Huang, Zhijian Wang, Zhenjie Hou, Tianjin Yang, Zhuang
Miao
- Abstract summary: We propose a lightweight and effective point cloud sequence network for real-time 3D action recognition.
Instead of capturing temporal-temporal local structures, SequentialPointNet encodes the temporal evolution of static appearances to recognize human actions.
Experiments on three widely-used 3D action recognition datasets demonstrate that the proposed SequentialPointNet achieves competitive classification performance with up to 10X faster than existing approaches.
- Score: 14.218567196931687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time 3D human action recognition has broad industrial applications, such
as surveillance, human-computer interaction, and healthcare monitoring. By
relying on complex spatio-temporal local encoding, most existing point cloud
sequence networks capture spatio-temporal local structures to recognize 3D
human actions. To simplify the point cloud sequence modeling task, we propose a
lightweight and effective point cloud sequence network referred to as
SequentialPointNet for real-time 3D action recognition. Instead of capturing
spatio-temporal local structures, SequentialPointNet encodes the temporal
evolution of static appearances to recognize human actions. Firstly, we define
a novel type of point data, Hyperpoint, to better describe the temporally
changing human appearances. A theoretical foundation is provided to clarify the
information equivalence property for converting point cloud sequences into
Hyperpoint sequences. Secondly, the point cloud sequence modeling task is
decomposed into a Hyperpoint embedding task and a Hyperpoint sequence modeling
task. Specifically, for Hyperpoint embedding, the static point cloud technology
is employed to convert point cloud sequences into Hyperpoint sequences, which
introduces inherent frame-level parallelism; for Hyperpoint sequence modeling,
a Hyperpoint-Mixer module is designed as the basic building block to learning
the spatio-temporal features of human actions. Extensive experiments on three
widely-used 3D action recognition datasets demonstrate that the proposed
SequentialPointNet achieves competitive classification performance with up to
10X faster than existing approaches.
Related papers
- KAN-HyperpointNet for Point Cloud Sequence-Based 3D Human Action Recognition [14.653930908806357]
We introduce D-Hyperpoint, a novel data type generated through a-Hyperpointding module.
D-Hyperpoint encapsulates both regional-momentary motion and global-static posture, effectively summarizing the unit human action at each moment.
We also present a D-Hyperpoint KANMixer module, which is applied to nested groupings of D-Hyperpoints to learn information discrimination.
arXiv Detail & Related papers (2024-09-14T14:11:45Z) - SPiKE: 3D Human Pose from Point Cloud Sequences [1.8024397171920885]
3D Human Pose Estimation (HPE) is the task of locating keypoints of the human body in 3D space from 2D or 3D representations such as RGB images, depth maps or point clouds.
This paper presents SPiKE, a novel approach to 3D HPE using point cloud sequences.
Experiments on the ITOP benchmark for 3D HPE show that SPiKE reaches 89.19% mAP, achieving state-of-the-art performance with significantly lower inference times.
arXiv Detail & Related papers (2024-09-03T13:22:01Z) - 3DMambaComplete: Exploring Structured State Space Model for Point Cloud Completion [19.60626235337542]
3DMambaComplete is a point cloud completion network built on the novel Mamba framework.
It encodes point cloud features using Mamba's selection mechanism and predicts a set of Hyperpoints.
A deformation method transforms the 2D mesh representation of HyperPoints into a fine-grained 3D structure for point cloud reconstruction.
arXiv Detail & Related papers (2024-04-10T15:45:03Z) - Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments.
We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs)
SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z) - StarNet: Style-Aware 3D Point Cloud Generation [82.30389817015877]
StarNet is able to reconstruct and generate high-fidelity and even 3D point clouds using a mapping network.
Our framework achieves comparable state-of-the-art performance on various metrics in the point cloud reconstruction and generation tasks.
arXiv Detail & Related papers (2023-03-28T08:21:44Z) - PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences [51.53563462897779]
We propose a point-ordered (PST) convolution to achieve informative representations of point cloud sequences.
PST first disentangles space and time in point cloud sequences, then a spatial convolution is employed to capture local structure points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
We incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner.
arXiv Detail & Related papers (2022-05-27T02:14:43Z) - PointAttN: You Only Need Attention for Point Cloud Completion [89.88766317412052]
Point cloud completion refers to completing 3D shapes from partial 3D point clouds.
We propose a novel neural network for processing point cloud in a per-point manner to eliminate kNNs.
The proposed framework, namely PointAttN, is simple, neat and effective, which can precisely capture the structural information of 3D shapes.
arXiv Detail & Related papers (2022-03-16T09:20:01Z) - Anchor-Based Spatial-Temporal Attention Convolutional Networks for
Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences.
The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point.
The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z) - SoftPoolNet: Shape Descriptor for Point Cloud Completion and
Classification [93.54286830844134]
We propose a method for 3D object completion and classification based on point clouds.
For the decoder stage, we propose regional convolutions, a novel operator aimed at maximizing the global activation entropy.
We evaluate our approach on different 3D tasks such as object completion and classification, achieving state-of-the-art accuracy.
arXiv Detail & Related papers (2020-08-17T14:32:35Z) - CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations [72.4716073597902]
We propose a method to learn object Canonical Point Cloud Representations of dynamically or moving objects.
We demonstrate the effectiveness of our method on several applications including shape reconstruction, camera pose estimation, continuoustemporal sequence reconstruction, and correspondence estimation.
arXiv Detail & Related papers (2020-08-06T17:58:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.