Li3DeTr: A LiDAR based 3D Detection Transformer
- URL: http://arxiv.org/abs/2210.15365v1
- Date: Thu, 27 Oct 2022 12:23:54 GMT
- Title: Li3DeTr: A LiDAR based 3D Detection Transformer
- Authors: Gopi Krishna Erabati and Helder Araujo
- Abstract summary: Li3DeTr is an end-to-end LiDAR based 3D Detection Transformer for autonomous driving.
The Li3DeTr network achieves 61.3% mAP and 67.6% NDS.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by recent advances in vision transformers for object detection, we
propose Li3DeTr, an end-to-end LiDAR based 3D Detection Transformer for
autonomous driving, that inputs LiDAR point clouds and regresses 3D bounding
boxes. The LiDAR local and global features are encoded using sparse convolution
and multi-scale deformable attention respectively. In the decoder head,
firstly, in the novel Li3DeTr cross-attention block, we link the LiDAR global
features to 3D predictions leveraging the sparse set of object queries learnt
from the data. Secondly, the object query interactions are formulated using
multi-head self-attention. Finally, the decoder layer is repeated $L_{dec}$
number of times to refine the object queries. Inspired by DETR, we employ
set-to-set loss to train the Li3DeTr network. Without bells and whistles, the
Li3DeTr network achieves 61.3% mAP and 67.6% NDS surpassing the
state-of-the-art methods with non-maximum suppression (NMS) on the nuScenes
dataset and it also achieves competitive performance on the KITTI dataset. We
also employ knowledge distillation (KD) using a teacher and student model that
slightly improves the performance of our network.
Related papers
- Sparse-to-Dense LiDAR Point Generation by LiDAR-Camera Fusion for 3D Object Detection [9.076003184833557]
We propose the LiDAR-Camera Augmentation Network (LCANet), a novel framework that reconstructs LiDAR point cloud data by fusing 2D image features.
LCANet fuses data from LiDAR sensors by projecting image features into the 3D space, integrating semantic information into the point cloud data.
This fusion effectively compensates for LiDAR's weakness in detecting objects at long distances, which are often represented by sparse points.
arXiv Detail & Related papers (2024-09-23T13:03:31Z) - Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene [22.297964850282177]
We propose LiDAR-2D Self-paced Learning (LiSe) for unsupervised 3D detection.
RGB images serve as a valuable complement to LiDAR data, offering precise 2D localization cues.
Our framework devises a self-paced learning pipeline that incorporates adaptive sampling and weak model aggregation strategies.
arXiv Detail & Related papers (2024-07-11T14:58:49Z) - Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning.
Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations.
This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z) - V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method.
We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z) - LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR
Perception [15.919789515451615]
We introduce a new LiDAR multi-task learning paradigm based on the transformer.
LiDARFormer exploits cross-task synergy to boost the performance of LiDAR perception tasks.
LiDARFormer is evaluated on the large-scale nuScenes and the Open datasets for both 3D detection and semantic segmentation tasks.
arXiv Detail & Related papers (2023-03-21T20:52:02Z) - MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous
Driving [0.0]
We propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy.
Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes.
MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR.
arXiv Detail & Related papers (2022-10-27T10:55:15Z) - Multimodal Transformer for Automatic 3D Annotation and Object Detection [27.92241487946078]
We propose an end-to-end multimodal transformer (MTrans) autolabeler to generate precise 3D box annotations from weak 2D bounding boxes.
With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously.
By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively.
arXiv Detail & Related papers (2022-07-20T10:38:29Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural
Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties.
Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns.
We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.