VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial
Attention
- URL: http://arxiv.org/abs/2203.09704v1
- Date: Fri, 18 Mar 2022 02:34:59 GMT
- Title: VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial
Attention
- Authors: Shengheng Deng, Zhihao Liang, Lin Sun and Kui Jia
- Abstract summary: We propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA)
The proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer perceptron widely adopted in standard attention modules is replaced with a convolutional one.
At the time of submission, our method achieves 63.0% in overall mAP and 69.8% in NDS on the nuScenes benchmark, outperforming all published methods by up to 24% in safety-crucial categories such as cyclist.
- Score: 32.44687996180621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting objects from LiDAR point clouds is of tremendous significance in
autonomous driving. In spite of good progress, accurate and reliable 3D
detection is yet to be achieved due to the sparsity and irregularity of LiDAR
point clouds. Among existing strategies, multi-view methods have shown great
promise by leveraging the more comprehensive information from both bird's eye
view (BEV) and range view (RV). These multi-view methods either refine the
proposals predicted from single view via fused features, or fuse the features
without considering the global spatial context; their performance is limited
consequently. In this paper, we propose to adaptively fuse multi-view features
in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA). The
proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer
perceptron widely adopted in standard attention modules is replaced with a
convolutional one. Thanks to the learned attention mechanism, VISTA can produce
fused features of high quality for prediction of proposals. We decouple the
classification and regression tasks in VISTA, and an additional constraint of
attention variance is applied that enables the attention module to focus on
specific targets instead of generic points. We conduct thorough experiments on
the benchmarks of nuScenes and Waymo; results confirm the efficacy of our
designs. At the time of submission, our method achieves 63.0% in overall mAP
and 69.8% in NDS on the nuScenes benchmark, outperforming all published methods
by up to 24% in safety-crucial categories such as cyclist. The source code in
PyTorch is available at https://github.com/Gorilla-Lab-SCUT/VISTA
Related papers
- Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV)
We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels.
Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge [20.459377705070043]
This report presents the 1st winning model for UG2+, a task in CVPR 2024 UAV Tracking and Pose-Estimation Challenge.
We propose a multi-modal UAV detection, classification, and 3D tracking method for accurate UAV classification and tracking.
Our system integrates cutting-edge classification techniques and sophisticated post-processing steps to boost accuracy and robustness.
arXiv Detail & Related papers (2024-05-26T07:21:18Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D
Vehicle Detection from Point Cloud [39.99118618229583]
We propose a unified model SegVoxelNet to address the above two problems.
A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view.
A novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range.
arXiv Detail & Related papers (2020-02-13T02:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.