PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
- URL: http://arxiv.org/abs/2205.13713v1
- Date: Fri, 27 May 2022 02:14:43 GMT
- Title: PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
- Authors: Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli
- Abstract summary: We propose a point-ordered (PST) convolution to achieve informative representations of point cloud sequences.
PST first disentangles space and time in point cloud sequences, then a spatial convolution is employed to capture local structure points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
We incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner.
- Score: 51.53563462897779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point cloud sequences are irregular and unordered in the spatial dimension
while exhibiting regularities and order in the temporal dimension. Therefore,
existing grid based convolutions for conventional video processing cannot be
directly applied to spatio-temporal modeling of raw point cloud sequences. In
this paper, we propose a point spatio-temporal (PST) convolution to achieve
informative representations of point cloud sequences. The proposed PST
convolution first disentangles space and time in point cloud sequences. Then, a
spatial convolution is employed to capture the local structure of points in the
3D space, and a temporal convolution is used to model the dynamics of the
spatial regions along the time dimension. Furthermore, we incorporate the
proposed PST convolution into a deep network, namely PSTNet, to extract
features of point cloud sequences in a hierarchical manner. Extensive
experiments on widely-used 3D action recognition and 4D semantic segmentation
datasets demonstrate the effectiveness of PSTNet to model point cloud
sequences.
Related papers
- SPiKE: 3D Human Pose from Point Cloud Sequences [1.8024397171920885]
3D Human Pose Estimation (HPE) is the task of locating keypoints of the human body in 3D space from 2D or 3D representations such as RGB images, depth maps or point clouds.
This paper presents SPiKE, a novel approach to 3D HPE using point cloud sequences.
Experiments on the ITOP benchmark for 3D HPE show that SPiKE reaches 89.19% mAP, achieving state-of-the-art performance with significantly lower inference times.
arXiv Detail & Related papers (2024-09-03T13:22:01Z) - Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments.
We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs)
SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z) - Masked Spatio-Temporal Structure Prediction for Self-supervised Learning
on Point Cloud Videos [75.9251839023226]
We propose a Masked-temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations.
MaST-Pre consists of two self-supervised learning tasks. First, by reconstructing masked point tubes, our method is able to capture appearance information of point cloud videos.
Second, to learn motion, we propose a temporal cardinality difference prediction task that estimates the change in the number of points within a point tube.
arXiv Detail & Related papers (2023-08-18T02:12:54Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Real-time 3D human action recognition based on Hyperpoint sequence [14.218567196931687]
We propose a lightweight and effective point cloud sequence network for real-time 3D action recognition.
Instead of capturing temporal-temporal local structures, SequentialPointNet encodes the temporal evolution of static appearances to recognize human actions.
Experiments on three widely-used 3D action recognition datasets demonstrate that the proposed SequentialPointNet achieves competitive classification performance with up to 10X faster than existing approaches.
arXiv Detail & Related papers (2021-11-16T14:13:32Z) - Anchor-Based Spatial-Temporal Attention Convolutional Networks for
Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences.
The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point.
The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z) - CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations [72.4716073597902]
We propose a method to learn object Canonical Point Cloud Representations of dynamically or moving objects.
We demonstrate the effectiveness of our method on several applications including shape reconstruction, camera pose estimation, continuoustemporal sequence reconstruction, and correspondence estimation.
arXiv Detail & Related papers (2020-08-06T17:58:48Z) - Pseudo-LiDAR Point Cloud Interpolation Based on 3D Motion Representation
and Spatial Supervision [68.35777836993212]
We propose a Pseudo-LiDAR point cloud network to generate temporally and spatially high-quality point cloud sequences.
By exploiting the scene flow between point clouds, the proposed network is able to learn a more accurate representation of the 3D spatial motion relationship.
arXiv Detail & Related papers (2020-06-20T03:11:04Z) - Unsupervised Learning of Global Registration of Temporal Sequence of
Point Clouds [16.019588704177288]
Global registration of point clouds aims to find an optimal alignment of a sequence of 2D or 3D point sets.
We present a novel method that takes advantage of current deep learning techniques for unsupervised learning of global registration from a temporal sequence of point clouds.
arXiv Detail & Related papers (2020-06-17T06:00:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.