Spatial-Temporal Transformer for 3D Point Cloud Sequences
- URL: http://arxiv.org/abs/2110.09783v1
- Date: Tue, 19 Oct 2021 07:55:47 GMT
- Title: Spatial-Temporal Transformer for 3D Point Cloud Sequences
- Authors: Yimin Wei, Hao Liu, Tingting Xie, Qiuhong Ke, Yulan Guo
- Abstract summary: We propose a novel framework named Point Spatial-Temporal Transformer (PST2) to learn spatial-temporal representations.
Our PST2 consists of two major modules: a Spatio-Temporal Self-Attention (STSA) module and a Resolution Embedding (RE) module.
We test the effectiveness our PST2 with two different tasks on point cloud sequences, i.e., 4D semantic segmentation and 3D action recognition.
- Score: 23.000688043417913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective learning of spatial-temporal information within a point cloud
sequence is highly important for many down-stream tasks such as 4D semantic
segmentation and 3D action recognition. In this paper, we propose a novel
framework named Point Spatial-Temporal Transformer (PST2) to learn
spatial-temporal representations from dynamic 3D point cloud sequences. Our
PST2 consists of two major modules: a Spatio-Temporal Self-Attention (STSA)
module and a Resolution Embedding (RE) module. Our STSA module is introduced to
capture the spatial-temporal context information across adjacent frames, while
the RE module is proposed to aggregate features across neighbors to enhance the
resolution of feature maps. We test the effectiveness our PST2 with two
different tasks on point cloud sequences, i.e., 4D semantic segmentation and 3D
action recognition. Extensive experiments on three benchmarks show that our
PST2 outperforms existing methods on all datasets. The effectiveness of our
STSA and RE modules have also been justified with ablation experiments.
Related papers
- State Space Model Meets Transformer: A New Paradigm for 3D Object Detection [33.49952392298874]
We propose a new 3D object DEtection paradigm with an interactive STate space model (DEST)
In the interactive SSM, we design a novel state-dependent SSM parameterization method that enables system states to effectively serve as queries in 3D indoor detection tasks.
Our method improves the GroupFree baseline in terms of AP50 on ScanNet V2 and SUN RGB-D datasets.
arXiv Detail & Related papers (2025-03-18T17:58:03Z) - FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud [7.711666704468952]
We address the problem of traversability assessment using point clouds.
We propose a pillar feature extraction module that utilizes PointNet to capture features from point clouds organized in vertical volume.
We then propose a newtemporal attention module to fuse multi-frame information, which can properly handle the varying density problem of LIDAR point clouds.
arXiv Detail & Related papers (2024-06-24T12:01:55Z) - Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments.
We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs)
SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences [51.53563462897779]
We propose a point-ordered (PST) convolution to achieve informative representations of point cloud sequences.
PST first disentangles space and time in point cloud sequences, then a spatial convolution is employed to capture local structure points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
We incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner.
arXiv Detail & Related papers (2022-05-27T02:14:43Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - SIENet: Spatial Information Enhancement Network for 3D Object Detection
from Point Cloud [20.84329063509459]
LiDAR-based 3D object detection pushes forward an immense influence on autonomous vehicles.
Due to the limitation of the intrinsic properties of LiDAR, fewer points are collected at the objects farther away from the sensor.
To address the challenge, we propose a novel two-stage 3D object detection framework, named SIENet.
arXiv Detail & Related papers (2021-03-29T07:45:09Z) - Anchor-Based Spatial-Temporal Attention Convolutional Networks for
Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences.
The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point.
The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.