CPGNet: Cascade Point-Grid Fusion Network for Real-Time LiDAR Semantic
Segmentation
- URL: http://arxiv.org/abs/2204.09914v1
- Date: Thu, 21 Apr 2022 06:56:30 GMT
- Title: CPGNet: Cascade Point-Grid Fusion Network for Real-Time LiDAR Semantic
Segmentation
- Authors: Xiaoyan Li, Gang Zhang, Hongyu Pan, Zhenhua Wang
- Abstract summary: We propose Cascade Point-Grid Fusion Network (CPGNet), which ensures both effectiveness and efficiency.
CPGNet without ensemble models or TTA is comparable with the state-of-the-art RPVNet, while it runs 4.7 times faster.
- Score: 8.944151935020992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LiDAR semantic segmentation essential for advanced autonomous driving is
required to be accurate, fast, and easy-deployed on mobile platforms. Previous
point-based or sparse voxel-based methods are far away from real-time
applications since time-consuming neighbor searching or sparse 3D convolution
are employed. Recent 2D projection-based methods, including range view and
multi-view fusion, can run in real time, but suffer from lower accuracy due to
information loss during the 2D projection. Besides, to improve the performance,
previous methods usually adopt test time augmentation (TTA), which further
slows down the inference process. To achieve a better speed-accuracy trade-off,
we propose Cascade Point-Grid Fusion Network (CPGNet), which ensures both
effectiveness and efficiency mainly by the following two techniques: 1) the
novel Point-Grid (PG) fusion block extracts semantic features mainly on the 2D
projected grid for efficiency, while summarizes both 2D and 3D features on 3D
point for minimal information loss; 2) the proposed transformation consistency
loss narrows the gap between the single-time model inference and TTA. The
experiments on the SemanticKITTI and nuScenes benchmarks demonstrate that the
CPGNet without ensemble models or TTA is comparable with the state-of-the-art
RPVNet, while it runs 4.7 times faster.
Related papers
- FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation [52.89847760590189]
3D scene understanding is a critical yet challenging task in autonomous driving.
Recent methods leverage the range-view representation to improve processing efficiency.
We re-design the workflow for range-view-based LiDAR semantic segmentation.
arXiv Detail & Related papers (2025-02-13T12:39:26Z) - Fast Occupancy Network [15.759329665907229]
Occupancy Network predicts category of voxel in specified 3D space around ego vehicle.
We present a simple and fast Occupancy Network model, which adopts a deformable 2D convolutional layer to lift BEV feature to 3D voxel feature.
We also present an efficient voxel feature pyramid network (FPN) module to improve performance with few computational cost.
arXiv Detail & Related papers (2024-12-10T03:46:03Z) - UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height [2.975860548186652]
Occupancy and 3D object detection are two standard tasks in modern autonomous driving system.
We propose a method to achieve fast 3D object detection and occupancy prediction (UltimateDO)
arXiv Detail & Related papers (2024-09-17T13:14:13Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud
Completion [69.32451612060214]
Real-scanned 3D point clouds are often incomplete, and it is important to recover complete point clouds for downstream applications.
Most existing point cloud completion methods use Chamfer Distance (CD) loss for training.
We propose a novel Point Diffusion-Refinement (PDR) paradigm for point cloud completion.
arXiv Detail & Related papers (2021-12-07T06:59:06Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - BEVDetNet: Bird's Eye View LiDAR Point Cloud based Real-time 3D Object
Detection for Autonomous Driving [6.389322215324224]
We propose a novel semantic segmentation architecture as a single unified model for object center detection using key points, box predictions and orientation prediction.
The proposed architecture can be trivially extended to include semantic segmentation classes like road without any additional computation.
The model is 5X faster than other top accuracy models with a minimal accuracy degradation of 2% in Average Precision at IoU=0.5 on KITTI dataset.
arXiv Detail & Related papers (2021-04-21T22:06:39Z) - Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR
Point Clouds [2.924868086534434]
This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud.
Our Multi-Projection Fusion framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models.
arXiv Detail & Related papers (2020-11-03T19:40:43Z) - Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid.
We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently.
We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z) - Pointwise Attention-Based Atrous Convolutional Neural Networks [15.499267533387039]
A pointwise attention-based atrous convolutional neural network architecture is proposed to efficiently deal with a large number of points.
The proposed model has been evaluated on the two most important 3D point cloud datasets for the 3D semantic segmentation task.
It achieves a reasonable performance compared to state-of-the-art models in terms of accuracy, with a much smaller number of parameters.
arXiv Detail & Related papers (2019-12-27T13:12:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.