TransUPR: A Transformer-based Uncertain Point Refiner for LiDAR Point
Cloud Semantic Segmentation
- URL: http://arxiv.org/abs/2302.08594v3
- Date: Thu, 12 Oct 2023 05:43:45 GMT
- Title: TransUPR: A Transformer-based Uncertain Point Refiner for LiDAR Point
Cloud Semantic Segmentation
- Authors: Zifan Yu, Meida Chen, Zhikang Zhang, Suya You, Raghuveer Rao, Sanjeev
Agarwal, and Fengbo Ren
- Abstract summary: We propose a transformer-based uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in a learnable manner.
Our TransUPR achieves state-of-the-art performance, i.e., 68.2% mean Intersection over Union (mIoU) on the Semantic KITTI benchmark.
- Score: 6.587305905804226
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Common image-based LiDAR point cloud semantic segmentation (LiDAR PCSS)
approaches have bottlenecks resulting from the boundary-blurring problem of
convolution neural networks (CNNs) and quantitation loss of spherical
projection. In this work, we propose a transformer-based plug-and-play
uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in
a learnable manner, which leads to an improved segmentation performance.
Uncertain points are sampled from coarse semantic segmentation results of 2D
image segmentation where uncertain points are located close to the object
boundaries in the 2D range image representation and 3D spherical projection
background points. Following that, the geometry and coarse semantic features of
uncertain points are aggregated by neighbor points in 3D space without adding
expensive computation and memory footprint. Finally, the transformer-based
refiner, which contains four stacked self-attention layers, along with an MLP
module, is utilized for uncertain point classification on the concatenated
features of self-attention layers. As the proposed refiner is independent of 2D
CNNs, our TransUPR can be easily integrated into any existing image-based LiDAR
PCSS approaches, e.g., CENet. Our TransUPR with the CENet achieves
state-of-the-art performance, i.e., 68.2% mean Intersection over Union (mIoU)
on the Semantic KITTI benchmark, which provides a performance improvement of
0.6% on the mIoU compared to the original CENet.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation [62.258256483231484]
LiDAR point cloud semantic segmentation enables the robots to obtain fine-grained semantic information of the surrounding environment.
Many works project the point cloud onto the 2D image and adopt the 2D Convolutional Neural Networks (CNNs) or vision transformer for LiDAR point cloud semantic segmentation.
In this paper, we propose a novel spherical frustum structure to avoid quantized information loss.
arXiv Detail & Related papers (2023-11-29T09:55:13Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - CPGNet: Cascade Point-Grid Fusion Network for Real-Time LiDAR Semantic
Segmentation [8.944151935020992]
We propose Cascade Point-Grid Fusion Network (CPGNet), which ensures both effectiveness and efficiency.
CPGNet without ensemble models or TTA is comparable with the state-of-the-art RPVNet, while it runs 4.7 times faster.
arXiv Detail & Related papers (2022-04-21T06:56:30Z) - CpT: Convolutional Point Transformer for 3D Point Cloud Processing [10.389972581905]
We present CpT: Convolutional point Transformer - a novel deep learning architecture for dealing with the unstructured nature of 3D point cloud data.
CpT is an improvement over existing attention-based Convolutions Neural Networks as well as previous 3D point cloud processing transformers.
Our model can serve as an effective backbone for various point cloud processing tasks when compared to the existing state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-21T17:45:55Z) - S3Net: 3D LiDAR Sparse Semantic Segmentation Network [1.330528227599978]
S3Net is a novel convolutional neural network for LiDAR point cloud semantic segmentation.
It adopts an encoder-decoder backbone that consists of Sparse Intra-channel Attention Module (SIntraAM) and Sparse Inter-channel Attention Module (SInterAM)
arXiv Detail & Related papers (2021-03-15T22:15:24Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR
Segmentation [81.02742110604161]
State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern.
Our method achieves the 1st place in the leaderboard of Semantic KITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%.
arXiv Detail & Related papers (2020-11-19T18:53:11Z) - Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR
Point Clouds [2.924868086534434]
This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud.
Our Multi-Projection Fusion framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models.
arXiv Detail & Related papers (2020-11-03T19:40:43Z) - PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation [111.7241018610573]
We present PointGroup, a new end-to-end bottom-up architecture for instance segmentation.
We design a two-branch network to extract point features and predict semantic labels and offsets, for shifting each point towards its respective instance centroid.
A clustering component is followed to utilize both the original and offset-shifted point coordinate sets, taking advantage of their complementary strength.
We conduct extensive experiments on two challenging datasets, ScanNet v2 and S3DIS, on which our method achieves the highest performance, 63.6% and 64.0%, compared to 54.9% and 54.4% achieved by former best
arXiv Detail & Related papers (2020-04-03T16:26:37Z) - Quaternion Equivariant Capsule Networks for 3D Point Clouds [58.566467950463306]
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations.
We connect dynamic routing between capsules to the well-known Weiszfeld algorithm.
Based on our operator, we build a capsule network that disentangles geometry from pose.
arXiv Detail & Related papers (2019-12-27T13:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.