Related papers: Small, Versatile and Mighty: A Range-View Perception Framework

Small, Versatile and Mighty: A Range-View Perception Framework

URL: http://arxiv.org/abs/2403.00325v1
Date: Fri, 1 Mar 2024 07:02:42 GMT
Title: Small, Versatile and Mighty: A Range-View Perception Framework
Authors: Qiang Meng, Xiao Wang, JiaBao Wang, Liujiang Yan, Ke Wang
Abstract summary: We propose a novel multi-task framework for 3D detection of LiDAR data. Our framework integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud. Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Open dataset.
Score: 13.85089181673372
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite its compactness and information integrity, the range view representation of LiDAR data rarely occurs as the first choice for 3D perception tasks. In this work, we further push the envelop of the range-view representation with a novel multi-task framework, achieving unprecedented 3D detection performances. Our proposed Small, Versatile, and Mighty (SVM) network utilizes a pure convolutional architecture to fully unleash the efficiency and multi-tasking potentials of the range view representation. To boost detection performances, we first propose a range-view specific Perspective Centric Label Assignment (PCLA) strategy, and a novel View Adaptive Regression (VAR) module to further refine hard-to-predict box properties. In addition, our framework seamlessly integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud, without extra modules. Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Waymo Open Dataset. Especially, over 10 mAP improvement over convolutional counterparts can be obtained on the vehicle class. Our presented results for other tasks further reveal the multi-task capabilities of the proposed small but mighty framework.

Related papers

RangeSAM: Leveraging Visual Foundation Models for Range-View repesented LiDAR segmentation [6.513648249086729]
We present the first range-view framework that adapts SAM2 to 3D segmentation, coupling efficient 2D feature extraction with standard projection/back-projection to operate on point clouds.<n>Our approach achieves competitive performance on Semantic KITTI while benefiting from the speed, scalability, and deployment simplicity of 2D-centric pipelines.
arXiv Detail & Related papers (2025-09-19T11:33:10Z)
AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds. Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations. Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z)
LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving [12.713417063678335]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation. We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively. We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections. We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing. We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z)
AOP-Net: All-in-One Perception Network for Joint LiDAR-based 3D Object Detection and Panoptic Segmentation [9.513467995188634]
AOP-Net is a LiDAR-based multi-task framework that combines 3D object detection and panoptic segmentation. The AOP-Net achieves state-of-the-art performance for published works on the nuScenes benchmark for both 3D object detection and panoptic segmentation tasks.
arXiv Detail & Related papers (2023-02-02T05:31:53Z)
A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving. We present SimMOD, a Simple baseline for Multi-camera Object Detection. We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z)
Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection. We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z)
Improving Point Cloud Semantic Segmentation by Learning 3D Object Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving. Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes. We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.