Human Action Recognition from Point Clouds over Time
- URL: http://arxiv.org/abs/2510.05506v3
- Date: Thu, 09 Oct 2025 01:21:42 GMT
- Title: Human Action Recognition from Point Clouds over Time
- Authors: James Dickens,
- Abstract summary: This paper presents a novel approach for recognizing actions from 3D videos by introducing a pipeline that segments human point clouds from the background of a scene.<n>The method supports point clouds from both depth sensors and monocular depth estimation.<n>Experiments incorporate auxiliary point features including surface normals, color, infrared intensity, and body part parsing labels, to enhance recognition accuracy.
- Score: 0.6345523830122167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research into human action recognition (HAR) has focused predominantly on skeletal action recognition and video-based methods. With the increasing availability of consumer-grade depth sensors and Lidar instruments, there is a growing opportunity to leverage dense 3D data for action recognition, to develop a third way. This paper presents a novel approach for recognizing actions from 3D videos by introducing a pipeline that segments human point clouds from the background of a scene, tracks individuals over time, and performs body part segmentation. The method supports point clouds from both depth sensors and monocular depth estimation. At the core of the proposed HAR framework is a novel backbone for 3D action recognition, which combines point-based techniques with sparse convolutional networks applied to voxel-mapped point cloud sequences. Experiments incorporate auxiliary point features including surface normals, color, infrared intensity, and body part parsing labels, to enhance recognition accuracy. Evaluation on the NTU RGB- D 120 dataset demonstrates that the method is competitive with existing skeletal action recognition algorithms. Moreover, combining both sensor-based and estimated depth inputs in an ensemble setup, this approach achieves 89.3% accuracy when different human subjects are considered for training and testing, outperforming previous point cloud action recognition methods.
Related papers
- Robust Human Registration with Body Part Segmentation on Noisy Point Clouds [73.00876572870787]
We introduce a hybrid approach that incorporates body-part segmentation into the mesh fitting process.<n>Our method first assigns body part labels to individual points, which then guide a two-step SMPL-X fitting.<n>We demonstrate that the fitted human mesh can refine body part labels, leading to improved segmentation.
arXiv Detail & Related papers (2025-04-04T17:17:33Z) - Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis [48.59382455101753]
2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose.
Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information.
In this work, we first construct a diverse depth dataset generated by 3D Morphable Models for depth model pre-training.
Then, we propose a domain-independent pre-training framework that utilizes readily available pre-trained RGB and depth models to separately perform face recognition without needing additional paired data for retraining.
arXiv Detail & Related papers (2024-03-11T09:12:24Z) - PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds [99.60575439926963]
We propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings.
PointHPS iteratively refines point features through a cascaded architecture.
Extensive experiments demonstrate that PointHPS, with its powerful point feature extraction and processing scheme, outperforms State-of-the-Art methods.
arXiv Detail & Related papers (2023-08-28T11:10:14Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - Analysis and Evaluation of Kinect-based Action Recognition Algorithms [2.7064617166078087]
We implement and improve the HDG algorithm, and applied it in cross-view action recognition using the UWA3D Multiview Activity dataset.
The experimental results show that our improvement of HDG outperforms other three state-of-the-art algorithms for cross-view action recognition.
arXiv Detail & Related papers (2021-12-16T05:04:06Z) - VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh
Recovery from Partial Point Clouds [32.72878775887121]
We make the first attempt to reconstruct reliable 3D human shapes from single-frame partial point clouds.
We propose an end-to-end learnable method, named VoteHMR.
The proposed method achieves state-of-the-art performances on two large-scale datasets.
arXiv Detail & Related papers (2021-10-17T05:42:04Z) - Anchor-Based Spatial-Temporal Attention Convolutional Networks for
Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences.
The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point.
The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z) - 3DFCNN: Real-Time Action Recognition using 3D Deep Neural Networks with
Raw Depth Information [1.3854111346209868]
This paper describes an approach for real-time human action recognition from raw depth image-sequences, provided by an RGB-D camera.
The proposal is based on a 3D fully convolutional neural network, named 3DFCNN, which automatically encodes-temporal patterns from depth sequences without %any costly pre-processing.
arXiv Detail & Related papers (2020-06-13T23:24:07Z) - Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud
Object Detection [64.2159881697615]
Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques.
We propose a domain adaptation like approach to enhance the robustness of the feature representation.
Our simple yet effective approach fundamentally boosts the performance of 3D point cloud object detection and achieves the state-of-the-art results.
arXiv Detail & Related papers (2020-06-08T05:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.