Rethinking Range View Representation for LiDAR Segmentation
- URL: http://arxiv.org/abs/2303.05367v3
- Date: Sun, 3 Sep 2023 05:02:00 GMT
- Title: Rethinking Range View Representation for LiDAR Segmentation
- Authors: Lingdong Kong and Youquan Liu and Runnan Chen and Yuexin Ma and Xinge
Zhu and Yikang Li and Yuenan Hou and Yu Qiao and Ziwei Liu
- Abstract summary: "Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
- Score: 66.73116059734788
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: LiDAR segmentation is crucial for autonomous driving perception. Recent
trends favor point- or voxel-based methods as they often yield better
performance than the traditional range view representation. In this work, we
unveil several key factors in building powerful range view models. We observe
that the "many-to-one" mapping, semantic incoherence, and shape deformation are
possible impediments against effective learning from range view projections. We
present RangeFormer -- a full-cycle framework comprising novel designs across
network architecture, data augmentation, and post-processing -- that better
handles the learning and processing of LiDAR point clouds from the range view.
We further introduce a Scalable Training from Range view (STR) strategy that
trains on arbitrary low-resolution 2D range images, while still maintaining
satisfactory 3D segmentation accuracy. We show that, for the first time, a
range view method is able to surpass the point, voxel, and multi-view fusion
counterparts in the competing LiDAR semantic and panoptic segmentation
benchmarks, i.e., SemanticKITTI, nuScenes, and ScribbleKITTI.
Related papers
- Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Small, Versatile and Mighty: A Range-View Perception Framework [13.85089181673372]
We propose a novel multi-task framework for 3D detection of LiDAR data.
Our framework integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud.
Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Open dataset.
arXiv Detail & Related papers (2024-03-01T07:02:42Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Few-Shot Panoptic Segmentation With Foundation Models [23.231014713335664]
We propose to leverage task-agnostic image features to enable few-shot panoptic segmentation by presenting Segmenting Panoptic Information with Nearly 0 labels (SPINO)
In detail, our method combines a DINOv2 backbone with lightweight network heads for semantic segmentation and boundary estimation.
We show that our approach, albeit being trained with only ten annotated images, predicts high-quality pseudo-labels that can be used with any existing panoptic segmentation method.
arXiv Detail & Related papers (2023-09-19T16:09:01Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Image Understands Point Cloud: Weakly Supervised 3D Semantic
Segmentation via Association Learning [59.64695628433855]
We propose a novel cross-modality weakly supervised method for 3D segmentation, incorporating complementary information from unlabeled images.
Basically, we design a dual-branch network equipped with an active labeling strategy, to maximize the power of tiny parts of labels.
Our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
arXiv Detail & Related papers (2022-09-16T07:59:04Z) - Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic
Segmentation via Clustering Pseudo Heatmap [9.770808277353128]
We propose a fast and high-performance LiDAR-based framework, referred to as Panoptic-PHNet.
We introduce a clustering pseudo heatmap as a new paradigm, which, followed by a center grouping module, yields instance centers for efficient clustering.
For backbone design, we fuse the fine-grained voxel features and the 2D Bird's Eye View (BEV) features with different receptive fields to utilize both detailed and global information.
arXiv Detail & Related papers (2022-05-14T08:16:13Z) - Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds
of Large Scenes with Learned Virtual View Visibility [17.929307870456416]
We present a novel framework for mesh reconstruction from unstructured point clouds.
We take advantage of the learned visibility of the 3D points in the virtual views and traditional graph-cut based mesh generation.
arXiv Detail & Related papers (2021-08-18T20:28:16Z) - RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR
Point Cloud Segmentation [28.494690309193068]
We propose a novel range-point-voxel fusion network, namely RPVNet.
In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views.
By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient.
arXiv Detail & Related papers (2021-03-24T04:24:12Z) - Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle
Re-Identification [53.6218051770131]
Cross-view consistent feature representation is key for accurate vehicle ReID.
Existing approaches resort to supervised cross-view learning using extensive extra viewpoints annotations.
We present a pluggable Weakly-supervised Cross-View Learning (WCVL) module for vehicle ReID.
arXiv Detail & Related papers (2021-03-09T11:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.