Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR
Segmentation
- URL: http://arxiv.org/abs/2011.10033v1
- Date: Thu, 19 Nov 2020 18:53:11 GMT
- Title: Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR
Segmentation
- Authors: Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li,
Hongsheng Li, Dahua Lin
- Abstract summary: State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern.
Our method achieves the 1st place in the leaderboard of Semantic KITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%.
- Score: 81.02742110604161
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: State-of-the-art methods for large-scale driving-scene LiDAR segmentation
often project the point clouds to 2D space and then process them via 2D
convolution. Although this corporation shows the competitiveness in the point
cloud, it inevitably alters and abandons the 3D topology and geometric
relations. A natural remedy is to utilize the3D voxelization and 3D convolution
network. However, we found that in the outdoor point cloud, the improvement
obtained in this way is quite limited. An important reason is the property of
the outdoor point cloud, namely sparsity and varying density. Motivated by this
investigation, we propose a new framework for the outdoor LiDAR segmentation,
where cylindrical partition and asymmetrical 3D convolution networks are
designed to explore the 3D geometric pat-tern while maintaining these inherent
properties. Moreover, a point-wise refinement module is introduced to alleviate
the interference of lossy voxel-based label encoding. We evaluate the proposed
model on two large-scale datasets, i.e., SemanticKITTI and nuScenes. Our method
achieves the 1st place in the leaderboard of SemanticKITTI and outperforms
existing methods on nuScenes with a noticeable margin, about 4%. Furthermore,
the proposed 3D framework also generalizes well to LiDAR panoptic segmentation
and LiDAR 3D detection.
Related papers
- DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D Mapping [46.80755234561584]
Recent learning-based methods integrate neural implicit representations and optimizable feature grids to approximate surfaces of 3D scenes.
In this work we depart from fitting LiDAR data exactly, instead letting the network optimize a non-metric monotonic implicit field defined in 3D space.
Our algorithm achieves high-quality dense 3D mapping performance as captured by multiple quantitative and perceptual measures and visual results obtained for Mai City, Newer College, and KITTI benchmarks.
arXiv Detail & Related papers (2024-03-26T09:58:06Z) - NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized
Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.
We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels.
We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene
Understanding [19.134536179555102]
We propose an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected graph models.
The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research.
arXiv Detail & Related papers (2020-11-29T12:56:19Z) - Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR
Point Clouds [2.924868086534434]
This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud.
Our Multi-Projection Fusion framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models.
arXiv Detail & Related papers (2020-11-03T19:40:43Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.