Related papers: Real-time 3D human action recognition based on Hyperpoint sequence

Real-time 3D human action recognition based on Hyperpoint sequence

URL: http://arxiv.org/abs/2111.08492v3
Date: Mon, 26 Feb 2024 08:48:08 GMT
Title: Real-time 3D human action recognition based on Hyperpoint sequence
Authors: Xing Li, Qian Huang, Zhijian Wang, Zhenjie Hou, Tianjin Yang, Zhuang Miao
Abstract summary: We propose a lightweight and effective point cloud sequence network for real-time 3D action recognition. Instead of capturing temporal-temporal local structures, SequentialPointNet encodes the temporal evolution of static appearances to recognize human actions. Experiments on three widely-used 3D action recognition datasets demonstrate that the proposed SequentialPointNet achieves competitive classification performance with up to 10X faster than existing approaches.
Score: 14.218567196931687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-time 3D human action recognition has broad industrial applications, such as surveillance, human-computer interaction, and healthcare monitoring. By relying on complex spatio-temporal local encoding, most existing point cloud sequence networks capture spatio-temporal local structures to recognize 3D human actions. To simplify the point cloud sequence modeling task, we propose a lightweight and effective point cloud sequence network referred to as SequentialPointNet for real-time 3D action recognition. Instead of capturing spatio-temporal local structures, SequentialPointNet encodes the temporal evolution of static appearances to recognize human actions. Firstly, we define a novel type of point data, Hyperpoint, to better describe the temporally changing human appearances. A theoretical foundation is provided to clarify the information equivalence property for converting point cloud sequences into Hyperpoint sequences. Secondly, the point cloud sequence modeling task is decomposed into a Hyperpoint embedding task and a Hyperpoint sequence modeling task. Specifically, for Hyperpoint embedding, the static point cloud technology is employed to convert point cloud sequences into Hyperpoint sequences, which introduces inherent frame-level parallelism; for Hyperpoint sequence modeling, a Hyperpoint-Mixer module is designed as the basic building block to learning the spatio-temporal features of human actions. Extensive experiments on three widely-used 3D action recognition datasets demonstrate that the proposed SequentialPointNet achieves competitive classification performance with up to 10X faster than existing approaches.

Related papers

Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction [56.32589034046427]
We introduce Dynamic Point Maps (DPM), extending standard point maps to support 4D tasks such as motion segmentation, scene flow estimation, 3D object tracking, and 2D correspondence. We train a DPM predictor on a mixture of synthetic and real data and evaluate it across diverse benchmarks for video depth prediction, dynamic point cloud reconstruction, 3D scene flow and object pose tracking, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-03-20T16:41:50Z)
KAN-HyperpointNet for Point Cloud Sequence-Based 3D Human Action Recognition [14.653930908806357]
We introduce D-Hyperpoint, a novel data type generated through a-Hyperpointding module. D-Hyperpoint encapsulates both regional-momentary motion and global-static posture, effectively summarizing the unit human action at each moment. We also present a D-Hyperpoint KANMixer module, which is applied to nested groupings of D-Hyperpoints to learn information discrimination.
arXiv Detail & Related papers (2024-09-14T14:11:45Z)
SPiKE: 3D Human Pose from Point Cloud Sequences [1.8024397171920885]
3D Human Pose Estimation (HPE) is the task of locating keypoints of the human body in 3D space from 2D or 3D representations such as RGB images, depth maps or point clouds. This paper presents SPiKE, a novel approach to 3D HPE using point cloud sequences. Experiments on the ITOP benchmark for 3D HPE show that SPiKE reaches 89.19% mAP, achieving state-of-the-art performance with significantly lower inference times.
arXiv Detail & Related papers (2024-09-03T13:22:01Z)
3DMambaComplete: Exploring Structured State Space Model for Point Cloud Completion [19.60626235337542]
3DMambaComplete is a point cloud completion network built on the novel Mamba framework. It encodes point cloud features using Mamba's selection mechanism and predicts a set of Hyperpoints. A deformation method transforms the 2D mesh representation of HyperPoints into a fine-grained 3D structure for point cloud reconstruction.
arXiv Detail & Related papers (2024-04-10T15:45:03Z)
Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments. We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs) SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z)
ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models [77.84408427496025]
State-of-the-art 3D point cloud registration methods rely on labeled 3D datasets for training. We introduce ZeroReg, a zero-shot registration approach that utilizes 2D foundation models to predict 3D correspondences.
arXiv Detail & Related papers (2023-12-05T11:33:16Z)
StarNet: Style-Aware 3D Point Cloud Generation [82.30389817015877]
StarNet is able to reconstruct and generate high-fidelity and even 3D point clouds using a mapping network. Our framework achieves comparable state-of-the-art performance on various metrics in the point cloud reconstruction and generation tasks.
arXiv Detail & Related papers (2023-03-28T08:21:44Z)
PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences [51.53563462897779]
We propose a point-ordered (PST) convolution to achieve informative representations of point cloud sequences. PST first disentangles space and time in point cloud sequences, then a spatial convolution is employed to capture local structure points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. We incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner.
arXiv Detail & Related papers (2022-05-27T02:14:43Z)
PointAttN: You Only Need Attention for Point Cloud Completion [89.88766317412052]
Point cloud completion refers to completing 3D shapes from partial 3D point clouds. We propose a novel neural network for processing point cloud in a per-point manner to eliminate kNNs. The proposed framework, namely PointAttN, is simple, neat and effective, which can precisely capture the structural information of 3D shapes.
arXiv Detail & Related papers (2022-03-16T09:20:01Z)
Anchor-Based Spatial-Temporal Attention Convolutional Networks for Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences. The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point. The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z)
SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification [93.54286830844134]
We propose a method for 3D object completion and classification based on point clouds. For the decoder stage, we propose regional convolutions, a novel operator aimed at maximizing the global activation entropy. We evaluate our approach on different 3D tasks such as object completion and classification, achieving state-of-the-art accuracy.
arXiv Detail & Related papers (2020-08-17T14:32:35Z)
CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations [72.4716073597902]
We propose a method to learn object Canonical Point Cloud Representations of dynamically or moving objects. We demonstrate the effectiveness of our method on several applications including shape reconstruction, camera pose estimation, continuoustemporal sequence reconstruction, and correspondence estimation.
arXiv Detail & Related papers (2020-08-06T17:58:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.