LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR
Perception
- URL: http://arxiv.org/abs/2303.12194v2
- Date: Sat, 2 Mar 2024 22:18:12 GMT
- Title: LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR
Perception
- Authors: Zixiang Zhou, Dongqiangzi Ye, Weijia Chen, Yufei Xie, Yu Wang, Panqu
Wang, Hassan Foroosh
- Abstract summary: We introduce a new LiDAR multi-task learning paradigm based on the transformer.
LiDARFormer exploits cross-task synergy to boost the performance of LiDAR perception tasks.
LiDARFormer is evaluated on the large-scale nuScenes and the Open datasets for both 3D detection and semantic segmentation tasks.
- Score: 15.919789515451615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a recent trend in the LiDAR perception field towards unifying
multiple tasks in a single strong network with improved performance, as opposed
to using separate networks for each task. In this paper, we introduce a new
LiDAR multi-task learning paradigm based on the transformer. The proposed
LiDARFormer utilizes cross-space global contextual feature information and
exploits cross-task synergy to boost the performance of LiDAR perception tasks
across multiple large-scale datasets and benchmarks. Our novel
transformer-based framework includes a cross-space transformer module that
learns attentive features between the 2D dense Bird's Eye View (BEV) and 3D
sparse voxel feature maps. Additionally, we propose a transformer decoder for
the segmentation task to dynamically adjust the learned features by leveraging
the categorical feature representations. Furthermore, we combine the
segmentation and detection features in a shared transformer decoder with
cross-task attention layers to enhance and integrate the object-level and
class-level features. LiDARFormer is evaluated on the large-scale nuScenes and
the Waymo Open datasets for both 3D detection and semantic segmentation tasks,
and it outperforms all previously published methods on both tasks. Notably,
LiDARFormer achieves the state-of-the-art performance of 76.4% L2 mAPH and
74.3% NDS on the challenging Waymo and nuScenes detection benchmarks for a
single model LiDAR-only method.
Related papers
- Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning.
Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations.
This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z) - A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - Small, Versatile and Mighty: A Range-View Perception Framework [13.85089181673372]
We propose a novel multi-task framework for 3D detection of LiDAR data.
Our framework integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud.
Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Open dataset.
arXiv Detail & Related papers (2024-03-01T07:02:42Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - AOP-Net: All-in-One Perception Network for Joint LiDAR-based 3D Object
Detection and Panoptic Segmentation [9.513467995188634]
AOP-Net is a LiDAR-based multi-task framework that combines 3D object detection and panoptic segmentation.
The AOP-Net achieves state-of-the-art performance for published works on the nuScenes benchmark for both 3D object detection and panoptic segmentation tasks.
arXiv Detail & Related papers (2023-02-02T05:31:53Z) - LidarMultiNet: Towards a Unified Multi-Task Network for LiDAR Perception [15.785527155108966]
LidarMultiNet is a LiDAR-based multi-task network that unifies 3D object detection, semantic segmentation, and panoptic segmentation.
At the core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture with a Global Context Pooling (GCP) module.
LidarMultiNet is extensively tested on both Open dataset and nuScenes dataset.
arXiv Detail & Related papers (2022-09-19T23:39:15Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks.
We extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames.
arXiv Detail & Related papers (2022-03-14T15:25:42Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.