MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term
Motion-Guided Temporal Attention for 3D Object Detection
- URL: http://arxiv.org/abs/2212.00442v1
- Date: Thu, 1 Dec 2022 11:24:47 GMT
- Title: MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term
Motion-Guided Temporal Attention for 3D Object Detection
- Authors: Junho Koh, Junhyung Lee, Youngwoo Lee, Jaekyum Kim, Jun Won Choi
- Abstract summary: Most LiDAR sensors generate a sequence of point clouds in real-time.
Recent studies have revealed that substantial performance improvement can be achieved by exploiting the context present in a sequence of point sets.
We propose a novel 3D object detection architecture, which can encode point cloud sequences acquired by multiple successive scans.
- Score: 8.305942415868042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most scanning LiDAR sensors generate a sequence of point clouds in real-time.
While conventional 3D object detectors use a set of unordered LiDAR points
acquired over a fixed time interval, recent studies have revealed that
substantial performance improvement can be achieved by exploiting the
spatio-temporal context present in a sequence of LiDAR point sets. In this
paper, we propose a novel 3D object detection architecture, which can encode
LiDAR point cloud sequences acquired by multiple successive scans. The encoding
process of the point cloud sequence is performed on two different time scales.
We first design a short-term motion-aware voxel encoding that captures the
short-term temporal changes of point clouds driven by the motion of objects in
each voxel. We also propose long-term motion-guided bird's eye view (BEV)
feature enhancement that adaptively aligns and aggregates the BEV feature maps
obtained by the short-term voxel encoding by utilizing the dynamic motion
context inferred from the sequence of the feature maps. The experiments
conducted on the public nuScenes benchmark demonstrate that the proposed 3D
object detector offers significant improvements in performance compared to the
baseline methods and that it sets a state-of-the-art performance for certain 3D
object detection categories. Code is available at
https://github.com/HYjhkoh/MGTANet.git
Related papers
- Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences [25.74000325019015]
We introduce a novel LiDAR 3D object detection framework, namely LiSTM, to facilitate spatial-temporal feature learning with cross-frame motion forecasting information.
We have conducted experiments on the aggregation and nuScenes datasets to demonstrate that the proposed framework achieves superior 3D detection performance.
arXiv Detail & Related papers (2024-09-06T16:29:04Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud
Sequences [38.7464958249103]
We propose MoDAR, using motion forecasting outputs as a type of virtual modality, to augment LiDAR point clouds.
A fused point cloud of both raw sensor points and the virtual points can then be fed to any off-the-shelf point-cloud based 3D object detector.
arXiv Detail & Related papers (2023-06-05T19:28:19Z) - D-Align: Dual Query Co-attention Network for 3D Object Detection Based
on Multi-frame Point Cloud Sequence [8.21339007493213]
Conventional 3D object detectors detect objects using a set of points acquired over a fixed duration.
Recent studies have shown that the performance of object detection can be further enhanced by utilizing point cloud sequences.
We propose D-Align, which can effectively produce strong bird's-eye-view (BEV) features by aligning and aggregating the features obtained from a sequence of point sets.
arXiv Detail & Related papers (2022-09-30T20:41:25Z) - Graph Neural Network and Spatiotemporal Transformer Attention for 3D
Video Object Detection from Point Clouds [94.21415132135951]
We propose to detect 3D objects by exploiting temporal information in multiple frames.
We implement our algorithm based on prevalent anchor-based and anchor-free detectors.
arXiv Detail & Related papers (2022-07-26T05:16:28Z) - A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete.
We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.