Rethinking Range View Representation for LiDAR Segmentation
- URL: http://arxiv.org/abs/2303.05367v3
- Date: Sun, 3 Sep 2023 05:02:00 GMT
- Title: Rethinking Range View Representation for LiDAR Segmentation
- Authors: Lingdong Kong and Youquan Liu and Runnan Chen and Yuexin Ma and Xinge
Zhu and Yikang Li and Yuenan Hou and Yu Qiao and Ziwei Liu
- Abstract summary: "Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
- Score: 66.73116059734788
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: LiDAR segmentation is crucial for autonomous driving perception. Recent
trends favor point- or voxel-based methods as they often yield better
performance than the traditional range view representation. In this work, we
unveil several key factors in building powerful range view models. We observe
that the "many-to-one" mapping, semantic incoherence, and shape deformation are
possible impediments against effective learning from range view projections. We
present RangeFormer -- a full-cycle framework comprising novel designs across
network architecture, data augmentation, and post-processing -- that better
handles the learning and processing of LiDAR point clouds from the range view.
We further introduce a Scalable Training from Range view (STR) strategy that
trains on arbitrary low-resolution 2D range images, while still maintaining
satisfactory 3D segmentation accuracy. We show that, for the first time, a
range view method is able to surpass the point, voxel, and multi-view fusion
counterparts in the competing LiDAR semantic and panoptic segmentation
benchmarks, i.e., SemanticKITTI, nuScenes, and ScribbleKITTI.
Related papers
- FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation [52.89847760590189]
3D scene understanding is a critical yet challenging task in autonomous driving.
Recent methods leverage the range-view representation to improve processing efficiency.
We re-design the workflow for range-view-based LiDAR semantic segmentation.
arXiv Detail & Related papers (2025-02-13T12:39:26Z) - LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets.
Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples.
Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Small, Versatile and Mighty: A Range-View Perception Framework [13.85089181673372]
We propose a novel multi-task framework for 3D detection of LiDAR data.
Our framework integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud.
Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Open dataset.
arXiv Detail & Related papers (2024-03-01T07:02:42Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Few-Shot Panoptic Segmentation With Foundation Models [23.231014713335664]
We propose to leverage task-agnostic image features to enable few-shot panoptic segmentation by presenting Segmenting Panoptic Information with Nearly 0 labels (SPINO)
In detail, our method combines a DINOv2 backbone with lightweight network heads for semantic segmentation and boundary estimation.
We show that our approach, albeit being trained with only ten annotated images, predicts high-quality pseudo-labels that can be used with any existing panoptic segmentation method.
arXiv Detail & Related papers (2023-09-19T16:09:01Z) - Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic
Segmentation via Clustering Pseudo Heatmap [9.770808277353128]
We propose a fast and high-performance LiDAR-based framework, referred to as Panoptic-PHNet.
We introduce a clustering pseudo heatmap as a new paradigm, which, followed by a center grouping module, yields instance centers for efficient clustering.
For backbone design, we fuse the fine-grained voxel features and the 2D Bird's Eye View (BEV) features with different receptive fields to utilize both detailed and global information.
arXiv Detail & Related papers (2022-05-14T08:16:13Z) - Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds
of Large Scenes with Learned Virtual View Visibility [17.929307870456416]
We present a novel framework for mesh reconstruction from unstructured point clouds.
We take advantage of the learned visibility of the 3D points in the virtual views and traditional graph-cut based mesh generation.
arXiv Detail & Related papers (2021-08-18T20:28:16Z) - RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR
Point Cloud Segmentation [28.494690309193068]
We propose a novel range-point-voxel fusion network, namely RPVNet.
In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views.
By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient.
arXiv Detail & Related papers (2021-03-24T04:24:12Z) - Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle
Re-Identification [53.6218051770131]
Cross-view consistent feature representation is key for accurate vehicle ReID.
Existing approaches resort to supervised cross-view learning using extensive extra viewpoints annotations.
We present a pluggable Weakly-supervised Cross-View Learning (WCVL) module for vehicle ReID.
arXiv Detail & Related papers (2021-03-09T11:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.