FIDNet: LiDAR Point Cloud Semantic Segmentation with Fully Interpolation
Decoding
- URL: http://arxiv.org/abs/2109.03787v1
- Date: Wed, 8 Sep 2021 17:20:09 GMT
- Title: FIDNet: LiDAR Point Cloud Semantic Segmentation with Fully Interpolation
Decoding
- Authors: Yiming Zhao, Lin Bai, and Xinming Huang
- Abstract summary: Projecting the point cloud on the 2D spherical range image transforms the LiDAR semantic segmentation to a 2D segmentation task on the range image.
We propose a new projection-based LiDAR semantic segmentation pipeline that consists of a novel network structure and an efficient post-processing step.
Our pipeline achieves the best performance among all projection-based methods with $64 times 2048$ resolution and all point-wise solutions.
- Score: 5.599306291149907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Projecting the point cloud on the 2D spherical range image transforms the
LiDAR semantic segmentation to a 2D segmentation task on the range image.
However, the LiDAR range image is still naturally different from the regular 2D
RGB image; for example, each position on the range image encodes the unique
geometry information. In this paper, we propose a new projection-based LiDAR
semantic segmentation pipeline that consists of a novel network structure and
an efficient post-processing step. In our network structure, we design a FID
(fully interpolation decoding) module that directly upsamples the
multi-resolution feature maps using bilinear interpolation. Inspired by the 3D
distance interpolation used in PointNet++, we argue this FID module is a 2D
version distance interpolation on $(\theta, \phi)$ space. As a parameter-free
decoding module, the FID largely reduces the model complexity by maintaining
good performance. Besides the network structure, we empirically find that our
model predictions have clear boundaries between different semantic classes.
This makes us rethink whether the widely used K-nearest-neighbor
post-processing is still necessary for our pipeline. Then, we realize the
many-to-one mapping causes the blurring effect that some points are mapped into
the same pixel and share the same label. Therefore, we propose to process those
occluded points by assigning the nearest predicted label to them. This NLA
(nearest label assignment) post-processing step shows a better performance than
KNN with faster inference speed in the ablation study. On the SemanticKITTI
dataset, our pipeline achieves the best performance among all projection-based
methods with $64 \times 2048$ resolution and all point-wise solutions. With a
ResNet-34 as the backbone, both the training and testing of our model can be
finished on a single RTX 2080 Ti with 11G memory. The code is released.
Related papers
- Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training.
Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks.
We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z) - SATR: Zero-Shot Semantic Segmentation of 3D Shapes [74.08209893396271]
We explore the task of zero-shot semantic segmentation of 3D shapes by using large-scale off-the-shelf 2D image recognition models.
We develop the Assignment with Topological Reweighting (SATR) algorithm and evaluate it on ShapeNetPart and our proposed FAUST benchmarks.
SATR achieves state-of-the-art performance and outperforms a baseline algorithm by 1.3% and 4% average mIoU.
arXiv Detail & Related papers (2023-04-11T00:43:16Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - RangeSeg: Range-Aware Real Time Segmentation of 3D LiDAR Point Clouds [0.6119392435448721]
This paper takes advantages of the uneven range distribution of different LiDAR laser beams to propose a range aware instance segmentation network, RangeSeg.
Experiments on the KITTI dataset show that RangeSeg outperforms the state-of-the-art semantic segmentation methods with enormous speedup.
The whole RangeSeg pipeline meets the real time requirement on NVIDIAtextsuperscripttextregistered JETSON AGX Xavier with 19 frames per second in average.
arXiv Detail & Related papers (2022-05-02T09:57:59Z) - Meta-RangeSeg: LiDAR Sequence Semantic Segmentation Using Multiple
Feature Aggregation [21.337629798133324]
We propose a novel approach to semantic segmentation for LiDAR sequences named Meta-RangeSeg.
A novel range residual image representation is introduced to capture the spatial-temporal information.
An efficient U-Net backbone is used to obtain the multi-scale features.
arXiv Detail & Related papers (2022-02-27T14:46:13Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - To the Point: Efficient 3D Object Detection in the Range Image with
Graph Convolution Kernels [30.3378171262436]
We design a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network.
Our method performs competitively on the Open dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%.
It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters.
arXiv Detail & Related papers (2021-06-25T01:27:26Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z) - Learning to Segment 3D Point Clouds in 2D Image Space [20.119802932358333]
We show how to efficiently project 3D point clouds into a 2D image space.
Traditional 2D convolutional neural networks (CNNs) such as U-Net can be applied for segmentation.
arXiv Detail & Related papers (2020-03-12T03:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.