Small, Versatile and Mighty: A Range-View Perception Framework
- URL: http://arxiv.org/abs/2403.00325v1
- Date: Fri, 1 Mar 2024 07:02:42 GMT
- Title: Small, Versatile and Mighty: A Range-View Perception Framework
- Authors: Qiang Meng, Xiao Wang, JiaBao Wang, Liujiang Yan, Ke Wang
- Abstract summary: We propose a novel multi-task framework for 3D detection of LiDAR data.
Our framework integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud.
Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Open dataset.
- Score: 13.85089181673372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite its compactness and information integrity, the range view
representation of LiDAR data rarely occurs as the first choice for 3D
perception tasks. In this work, we further push the envelop of the range-view
representation with a novel multi-task framework, achieving unprecedented 3D
detection performances. Our proposed Small, Versatile, and Mighty (SVM) network
utilizes a pure convolutional architecture to fully unleash the efficiency and
multi-tasking potentials of the range view representation. To boost detection
performances, we first propose a range-view specific Perspective Centric Label
Assignment (PCLA) strategy, and a novel View Adaptive Regression (VAR) module
to further refine hard-to-predict box properties. In addition, our framework
seamlessly integrates semantic segmentation and panoptic segmentation tasks for
the LiDAR point cloud, without extra modules. Among range-view-based methods,
our model achieves new state-of-the-art detection performances on the Waymo
Open Dataset. Especially, over 10 mAP improvement over convolutional
counterparts can be obtained on the vehicle class. Our presented results for
other tasks further reveal the multi-task capabilities of the proposed small
but mighty framework.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - AOP-Net: All-in-One Perception Network for Joint LiDAR-based 3D Object
Detection and Panoptic Segmentation [9.513467995188634]
AOP-Net is a LiDAR-based multi-task framework that combines 3D object detection and panoptic segmentation.
The AOP-Net achieves state-of-the-art performance for published works on the nuScenes benchmark for both 3D object detection and panoptic segmentation tasks.
arXiv Detail & Related papers (2023-02-02T05:31:53Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.