Attention-based Proposals Refinement for 3D Object Detection
- URL: http://arxiv.org/abs/2201.07070v1
- Date: Tue, 18 Jan 2022 15:50:31 GMT
- Title: Attention-based Proposals Refinement for 3D Object Detection
- Authors: Minh-Quan Dao, Elwan H\'ery, Vincent Fr\'emont
- Abstract summary: This paper takes a more data-driven approach to ROI feature extraction using the attention mechanism.
Experiments on KITTI textitvalidation set show that our method achieves competitive performance of 84.84 AP for class Car at moderate difficulty.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Safe autonomous driving technology heavily depends on accurate 3D object
detection since it produces input to safety critical downstream tasks such as
prediction and navigation. Recent advances in this field is made by developing
the refinement stage for voxel-based region proposal networks to better strike
the balance between accuracy and efficiency. A popular approach among
state-of-the-art frameworks is to divide proposals, or Region of Interest
(ROI), into grids and extract feature for each grid location before
synthesizing them to ROI feature. While achieving impressive performances, such
an approach involves a number of hand crafted components (e.g. grid sampling,
set abstraction) which requires expert knowledge to be tuned correctly. This
paper takes a more data-driven approach to ROI feature extraction using the
attention mechanism. Specifically, points inside a ROI are positionally encoded
to incorporate ROI 's geometry. The resulted position encoding and their
features are transformed into ROI feature via vector attention. Unlike the
original multi-head attention, vector attention assign different weights to
different channels within a point feature, thus being able to capture a more
sophisticated relation between pooled points and ROI. Experiments on KITTI
\textit{validation} set show that our method achieves competitive performance
of 84.84 AP for class Car at Moderate difficulty while having the least
parameters compared to closely related methods and attaining a quasi-real time
inference speed at 15 FPS on NVIDIA V100 GPU. The code will be released.
Related papers
- Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - Correlation Pyramid Network for 3D Single Object Tracking [16.694809791177263]
We propose a novel Correlation Pyramid Network (CorpNet) with a unified encoder and a motion-factorized decoder.
CorpNet achieves state-of-the-art results while running in real-time.
arXiv Detail & Related papers (2023-05-16T06:07:20Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - SoK: Vehicle Orientation Representations for Deep Rotation Estimation [2.052323405257355]
We study the accuracy performance of various existing orientation representations using the KITTI 3D object detection dataset.
We propose a new form of orientation representation: Tricosine.
arXiv Detail & Related papers (2021-12-08T17:12:54Z) - Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object
Detection [89.66162518035144]
We present a flexible and high-performance framework, named Pyramid R-CNN, for two-stage 3D object detection from point clouds.
We propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.
Our pyramid RoI head is robust to the sparse and imbalanced circumstances, and can be applied upon various 3D backbones to consistently boost the detection performance.
arXiv Detail & Related papers (2021-09-06T14:17:51Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.