TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object
Detection
- URL: http://arxiv.org/abs/2208.03141v1
- Date: Thu, 4 Aug 2022 15:41:43 GMT
- Title: TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object
Detection
- Authors: Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Tianrui Liu, Shijian Lu,
Liang Pan
- Abstract summary: 3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics.
We design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames.
Our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches.
- Score: 47.941714033657675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection using point clouds has attracted increasing attention due
to its wide applications in autonomous driving and robotics. However, most
existing studies focus on single point cloud frames without harnessing the
temporal information in point cloud sequences. In this paper, we design
TransPillars, a novel transformer-based feature aggregation technique that
exploits temporal features of consecutive point cloud frames for multi-frame 3D
object detection. TransPillars aggregates spatial-temporal point cloud features
from two perspectives. First, it fuses voxel-level features directly from
multi-frame feature maps instead of pooled instance features to preserve
instance details with contextual information that are essential to accurate
object localization. Second, it introduces a hierarchical coarse-to-fine
strategy to fuse multi-scale features progressively to effectively capture the
motion of moving objects and guide the aggregation of fine features. Besides, a
variant of deformable transformer is introduced to improve the effectiveness of
cross-frame feature matching. Extensive experiments show that our proposed
TransPillars achieves state-of-art performance as compared to existing
multi-frame detection approaches. Code will be released.
Related papers
- PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking [11.901758708579642]
3D single object tracking with point clouds is a critical task in 3D computer vision.
Previous methods usually input the last two frames and use the template point cloud in previous frame and the search area point cloud in the current frame respectively.
arXiv Detail & Related papers (2023-06-30T07:25:11Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Boosting Single-Frame 3D Object Detection by Simulating Multi-Frame
Point Clouds [47.488158093929904]
We present a new approach to train a detector to simulate features and responses following a detector trained on multi-frame point clouds.
Our approach needs multi-frame point clouds only when training the single-frame detector, and once trained, it can detect objects with only single-frame point clouds as inputs during the inference.
arXiv Detail & Related papers (2022-07-03T12:59:50Z) - Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking
from View Aggregation [8.854112907350624]
3D multi-object tracking plays a vital role in autonomous navigation.
Many approaches detect objects in 2D RGB sequences for tracking, which is lack of reliability when localizing objects in 3D space.
We propose a novel convolutional operation, named RelationConv, to better exploit the correlation between each pair of objects in the adjacent frames.
arXiv Detail & Related papers (2020-11-25T16:14:40Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.