UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the
OpenPCSeg Codebase
- URL: http://arxiv.org/abs/2309.05573v1
- Date: Mon, 11 Sep 2023 16:00:22 GMT
- Title: UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the
OpenPCSeg Codebase
- Authors: Youquan Liu, Runnan Chen, Xin Li, Lingdong Kong, Yuchen Yang, Zhaoyang
Xia, Yeqi Bai, Xinge Zhu, Yuexin Ma, Yikang Li, Yu Qiao, Yuenan Hou
- Abstract summary: We present a unified multi-modal LiDAR segmentation network, termed UniSeg.
It accomplishes semantic segmentation and panoptic segmentation simultaneously.
We also construct the OpenPCSeg, which is the largest and most comprehensive outdoor LiDAR segmentation.
- Score: 43.95911443801265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point-, voxel-, and range-views are three representative forms of point
clouds. All of them have accurate 3D measurements but lack color and texture
information. RGB images are a natural complement to these point cloud views and
fully utilizing the comprehensive information of them benefits more robust
perceptions. In this paper, we present a unified multi-modal LiDAR segmentation
network, termed UniSeg, which leverages the information of RGB images and three
views of the point cloud, and accomplishes semantic segmentation and panoptic
segmentation simultaneously. Specifically, we first design the Learnable
cross-Modal Association (LMA) module to automatically fuse voxel-view and
range-view features with image features, which fully utilize the rich semantic
information of images and are robust to calibration errors. Then, the enhanced
voxel-view and range-view features are transformed to the point space,where
three views of point cloud features are further fused adaptively by the
Learnable cross-View Association module (LVA). Notably, UniSeg achieves
promising results in three public benchmarks, i.e., SemanticKITTI, nuScenes,
and Waymo Open Dataset (WOD); it ranks 1st on two challenges of two benchmarks,
including the LiDAR semantic segmentation challenge of nuScenes and panoptic
segmentation challenges of SemanticKITTI. Besides, we construct the OpenPCSeg
codebase, which is the largest and most comprehensive outdoor LiDAR
segmentation codebase. It contains most of the popular outdoor LiDAR
segmentation algorithms and provides reproducible implementations. The
OpenPCSeg codebase will be made publicly available at
https://github.com/PJLab-ADG/PCSeg.
Related papers
- LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation.
Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z) - Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds.
It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy.
A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z) - LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and
Semantic-Aware Alignment [63.83894701779067]
We propose LCPS, the first LiDAR-Camera Panoptic network.
In our approach, we conduct LiDAR-Camera fusion in three stages.
Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset.
arXiv Detail & Related papers (2023-08-03T10:57:58Z) - CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation [88.50067783122559]
CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF)
Experiments show that CalibNet yields a promising result, i.e., 58.0% AP with 320*480 input size on the COME15K-N test set.
arXiv Detail & Related papers (2023-07-16T16:49:59Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - RangeSeg: Range-Aware Real Time Segmentation of 3D LiDAR Point Clouds [0.6119392435448721]
This paper takes advantages of the uneven range distribution of different LiDAR laser beams to propose a range aware instance segmentation network, RangeSeg.
Experiments on the KITTI dataset show that RangeSeg outperforms the state-of-the-art semantic segmentation methods with enormous speedup.
The whole RangeSeg pipeline meets the real time requirement on NVIDIAtextsuperscripttextregistered JETSON AGX Xavier with 19 frames per second in average.
arXiv Detail & Related papers (2022-05-02T09:57:59Z) - Enhanced Boundary Learning for Glass-like Object Segmentation [55.45473926510806]
This paper aims to solve the glass-like object segmentation problem via enhanced boundary learning.
In particular, we first propose a novel refined differential module for generating finer boundary cues.
An edge-aware point-based graph convolution network module is proposed to model the global shape representation along the boundary.
arXiv Detail & Related papers (2021-03-29T16:18:57Z) - RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR
Point Cloud Segmentation [28.494690309193068]
We propose a novel range-point-voxel fusion network, namely RPVNet.
In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views.
By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient.
arXiv Detail & Related papers (2021-03-24T04:24:12Z) - FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point
Cloud Segmentation [30.736361776703568]
Scene understanding based on LiDAR point cloud is an essential task for autonomous cars to drive safely.
Most existing methods simply stack different point attributes/modalities as image channels to increase information capacity.
We design FPS-Net, a convolutional fusion network that exploits the uniqueness and discrepancy among the projected image channels for optimal point cloud segmentation.
arXiv Detail & Related papers (2021-03-01T04:08:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.