VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial
Attention
- URL: http://arxiv.org/abs/2203.09704v1
- Date: Fri, 18 Mar 2022 02:34:59 GMT
- Title: VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial
Attention
- Authors: Shengheng Deng, Zhihao Liang, Lin Sun and Kui Jia
- Abstract summary: We propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA)
The proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer perceptron widely adopted in standard attention modules is replaced with a convolutional one.
At the time of submission, our method achieves 63.0% in overall mAP and 69.8% in NDS on the nuScenes benchmark, outperforming all published methods by up to 24% in safety-crucial categories such as cyclist.
- Score: 32.44687996180621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting objects from LiDAR point clouds is of tremendous significance in
autonomous driving. In spite of good progress, accurate and reliable 3D
detection is yet to be achieved due to the sparsity and irregularity of LiDAR
point clouds. Among existing strategies, multi-view methods have shown great
promise by leveraging the more comprehensive information from both bird's eye
view (BEV) and range view (RV). These multi-view methods either refine the
proposals predicted from single view via fused features, or fuse the features
without considering the global spatial context; their performance is limited
consequently. In this paper, we propose to adaptively fuse multi-view features
in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA). The
proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer
perceptron widely adopted in standard attention modules is replaced with a
convolutional one. Thanks to the learned attention mechanism, VISTA can produce
fused features of high quality for prediction of proposals. We decouple the
classification and regression tasks in VISTA, and an additional constraint of
attention variance is applied that enables the attention module to focus on
specific targets instead of generic points. We conduct thorough experiments on
the benchmarks of nuScenes and Waymo; results confirm the efficacy of our
designs. At the time of submission, our method achieves 63.0% in overall mAP
and 69.8% in NDS on the nuScenes benchmark, outperforming all published methods
by up to 24% in safety-crucial categories such as cyclist. The source code in
PyTorch is available at https://github.com/Gorilla-Lab-SCUT/VISTA
Related papers
- DVPE: Divided View Position Embedding for Multi-View 3D Object Detection [7.791229698270439]
Current research faces challenges in balancing between receptive fields and reducing interference when aggregating multi-view features.
This paper proposes a divided view method, in which features are modeled globally via the visibility crossattention mechanism, but interact only with partial features in a divided local virtual space.
Our framework, named DVPE, achieves state-of-the-art performance (57.2% mAP and 64.5% NDS) on the nuScenes test set.
arXiv Detail & Related papers (2024-07-24T02:44:41Z) - Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge [20.459377705070043]
This report presents the 1st winning model for UG2+, a task in CVPR 2024 UAV Tracking and Pose-Estimation Challenge.
We propose a multi-modal UAV detection, classification, and 3D tracking method for accurate UAV classification and tracking.
Our system integrates cutting-edge classification techniques and sophisticated post-processing steps to boost accuracy and robustness.
arXiv Detail & Related papers (2024-05-26T07:21:18Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D
Vehicle Detection from Point Cloud [39.99118618229583]
We propose a unified model SegVoxelNet to address the above two problems.
A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view.
A novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range.
arXiv Detail & Related papers (2020-02-13T02:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.