Real-time 3D human action recognition based on Hyperpoint sequence
- URL: http://arxiv.org/abs/2111.08492v3
- Date: Mon, 26 Feb 2024 08:48:08 GMT
- Title: Real-time 3D human action recognition based on Hyperpoint sequence
- Authors: Xing Li, Qian Huang, Zhijian Wang, Zhenjie Hou, Tianjin Yang, Zhuang
Miao
- Abstract summary: We propose a lightweight and effective point cloud sequence network for real-time 3D action recognition.
Instead of capturing temporal-temporal local structures, SequentialPointNet encodes the temporal evolution of static appearances to recognize human actions.
Experiments on three widely-used 3D action recognition datasets demonstrate that the proposed SequentialPointNet achieves competitive classification performance with up to 10X faster than existing approaches.
- Score: 14.218567196931687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time 3D human action recognition has broad industrial applications, such
as surveillance, human-computer interaction, and healthcare monitoring. By
relying on complex spatio-temporal local encoding, most existing point cloud
sequence networks capture spatio-temporal local structures to recognize 3D
human actions. To simplify the point cloud sequence modeling task, we propose a
lightweight and effective point cloud sequence network referred to as
SequentialPointNet for real-time 3D action recognition. Instead of capturing
spatio-temporal local structures, SequentialPointNet encodes the temporal
evolution of static appearances to recognize human actions. Firstly, we define
a novel type of point data, Hyperpoint, to better describe the temporally
changing human appearances. A theoretical foundation is provided to clarify the
information equivalence property for converting point cloud sequences into
Hyperpoint sequences. Secondly, the point cloud sequence modeling task is
decomposed into a Hyperpoint embedding task and a Hyperpoint sequence modeling
task. Specifically, for Hyperpoint embedding, the static point cloud technology
is employed to convert point cloud sequences into Hyperpoint sequences, which
introduces inherent frame-level parallelism; for Hyperpoint sequence modeling,
a Hyperpoint-Mixer module is designed as the basic building block to learning
the spatio-temporal features of human actions. Extensive experiments on three
widely-used 3D action recognition datasets demonstrate that the proposed
SequentialPointNet achieves competitive classification performance with up to
10X faster than existing approaches.
Related papers
- 3DMambaComplete: Exploring Structured State Space Model for Point Cloud Completion [19.60626235337542]
3DMambaComplete is a point cloud completion network built on the novel Mamba framework.
It encodes point cloud features using Mamba's selection mechanism and predicts a set of Hyperpoints.
A deformation method transforms the 2D mesh representation of HyperPoints into a fine-grained 3D structure for point cloud reconstruction.
arXiv Detail & Related papers (2024-04-10T15:45:03Z) - Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments.
We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs)
SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z) - SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network [1.4732811715354455]
We introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network)
A lightweight auto-encoder, based on spiral convolutions, is employed to extract spatial geometrical features from each 3D mesh.
The proposed method is evaluated on three prominent 3D human action datasets: Babel, MoVi, and BMLrub.
arXiv Detail & Related papers (2023-06-30T11:49:00Z) - StarNet: Style-Aware 3D Point Cloud Generation [82.30389817015877]
StarNet is able to reconstruct and generate high-fidelity and even 3D point clouds using a mapping network.
Our framework achieves comparable state-of-the-art performance on various metrics in the point cloud reconstruction and generation tasks.
arXiv Detail & Related papers (2023-03-28T08:21:44Z) - MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term
Motion-Guided Temporal Attention for 3D Object Detection [8.305942415868042]
Most LiDAR sensors generate a sequence of point clouds in real-time.
Recent studies have revealed that substantial performance improvement can be achieved by exploiting the context present in a sequence of point sets.
We propose a novel 3D object detection architecture, which can encode point cloud sequences acquired by multiple successive scans.
arXiv Detail & Related papers (2022-12-01T11:24:47Z) - PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences [51.53563462897779]
We propose a point-ordered (PST) convolution to achieve informative representations of point cloud sequences.
PST first disentangles space and time in point cloud sequences, then a spatial convolution is employed to capture local structure points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
We incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner.
arXiv Detail & Related papers (2022-05-27T02:14:43Z) - PointAttN: You Only Need Attention for Point Cloud Completion [89.88766317412052]
Point cloud completion refers to completing 3D shapes from partial 3D point clouds.
We propose a novel neural network for processing point cloud in a per-point manner to eliminate kNNs.
The proposed framework, namely PointAttN, is simple, neat and effective, which can precisely capture the structural information of 3D shapes.
arXiv Detail & Related papers (2022-03-16T09:20:01Z) - Anchor-Based Spatial-Temporal Attention Convolutional Networks for
Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences.
The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point.
The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z) - SoftPoolNet: Shape Descriptor for Point Cloud Completion and
Classification [93.54286830844134]
We propose a method for 3D object completion and classification based on point clouds.
For the decoder stage, we propose regional convolutions, a novel operator aimed at maximizing the global activation entropy.
We evaluate our approach on different 3D tasks such as object completion and classification, achieving state-of-the-art accuracy.
arXiv Detail & Related papers (2020-08-17T14:32:35Z) - CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations [72.4716073597902]
We propose a method to learn object Canonical Point Cloud Representations of dynamically or moving objects.
We demonstrate the effectiveness of our method on several applications including shape reconstruction, camera pose estimation, continuoustemporal sequence reconstruction, and correspondence estimation.
arXiv Detail & Related papers (2020-08-06T17:58:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.