Graph Neural Network and Spatiotemporal Transformer Attention for 3D
Video Object Detection from Point Clouds
- URL: http://arxiv.org/abs/2207.12659v1
- Date: Tue, 26 Jul 2022 05:16:28 GMT
- Title: Graph Neural Network and Spatiotemporal Transformer Attention for 3D
Video Object Detection from Point Clouds
- Authors: Junbo Yin, Jianbing Shen, Xin Gao, David Crandall and Ruigang Yang
- Abstract summary: We propose to detect 3D objects by exploiting temporal information in multiple frames.
We implement our algorithm based on prevalent anchor-based and anchor-free detectors.
- Score: 94.21415132135951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous works for LiDAR-based 3D object detection mainly focus on the
single-frame paradigm. In this paper, we propose to detect 3D objects by
exploiting temporal information in multiple frames, i.e., the point cloud
videos. We empirically categorize the temporal information into short-term and
long-term patterns. To encode the short-term data, we present a Grid Message
Passing Network (GMPNet), which considers each grid (i.e., the grouped points)
as a node and constructs a k-NN graph with the neighbor grids. To update
features for a grid, GMPNet iteratively collects information from its
neighbors, thus mining the motion cues in grids from nearby frames. To further
aggregate the long-term frames, we propose an Attentive Spatiotemporal
Transformer GRU (AST-GRU), which contains a Spatial Transformer Attention (STA)
module and a Temporal Transformer Attention (TTA) module. STA and TTA enhance
the vanilla GRU to focus on small objects and better align the moving objects.
Our overall framework supports both online and offline video object detection
in point clouds. We implement our algorithm based on prevalent anchor-based and
anchor-free detectors. The evaluation results on the challenging nuScenes
benchmark show the superior performance of our method, achieving the 1st on the
leaderboard without any bells and whistles, by the time the paper is submitted.
Related papers
- PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection [40.267769862404684]
We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.
Our main motivation is fusing object-aware latent embeddings into the early stages of a 3D object detector.
arXiv Detail & Related papers (2023-09-28T21:58:25Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Collect-and-Distribute Transformer for 3D Point Cloud Analysis [82.03517861433849]
We propose a new transformer network equipped with a collect-and-distribute mechanism to communicate short- and long-range contexts of point clouds.
Results show the effectiveness of the proposed CDFormer, delivering several new state-of-the-art performances on point cloud classification and segmentation tasks.
arXiv Detail & Related papers (2023-06-02T03:48:45Z) - MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term
Motion-Guided Temporal Attention for 3D Object Detection [8.305942415868042]
Most LiDAR sensors generate a sequence of point clouds in real-time.
Recent studies have revealed that substantial performance improvement can be achieved by exploiting the context present in a sequence of point sets.
We propose a novel 3D object detection architecture, which can encode point cloud sequences acquired by multiple successive scans.
arXiv Detail & Related papers (2022-12-01T11:24:47Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - Anchor-Based Spatial-Temporal Attention Convolutional Networks for
Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences.
The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point.
The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.