Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion
- URL: http://arxiv.org/abs/2407.09697v1
- Date: Fri, 12 Jul 2024 21:41:57 GMT
- Title: Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion
- Authors: Shiqi Tan, Hamidreza Fazlali, Yixuan Xu, Yuan Ren, Bingbing Liu,
- Abstract summary: Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form.
However, RV-based methods fall short in providing robust segmentation for the occluded points.
We propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentation method (LaCRange)
In addition to being real-time, the proposed method achieves state-of-the-art results on nuScenes benchmark.
- Score: 18.431017678057348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentation method (LaCRange). Specifically, a distortion-compensating knowledge distillation (DCKD) strategy is designed to remedy the adverse effect of RV projection of RGB images. Moreover, a context-based feature fusion module is introduced for robust and preservative sensor fusion. Finally, in order to address the limited resolution of RV and its insufficiency of 3D topology, a new point refinement scheme is devised for proper aggregation of features in 2D and augmentation of point features in 3D. We evaluated the proposed method on large-scale autonomous driving datasets \ie SemanticKITTI and nuScenes. In addition to being real-time, the proposed method achieves state-of-the-art results on nuScenes benchmark
Related papers
- DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection.
It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence.
This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z) - CCD-3DR: Consistent Conditioning in Diffusion for Single-Image 3D
Reconstruction [81.98244738773766]
We present CCD-3DR, which exploits a novel centered diffusion probabilistic model for consistent local feature conditioning.
CCD-3DR outperforms all competitors by a large margin, with over 40% improvement.
arXiv Detail & Related papers (2023-08-15T15:27:42Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object
Detection [15.641616738865276]
We propose a general multimodal fusion framework FusionPainting to fuse the 2D RGB image and 3D point clouds at a semantic level for boosting the 3D object detection task.
Especially, the FusionPainting framework consists of three main modules: a multi-modal semantic segmentation module, an adaptive attention-based semantic fusion module, and a 3D object detector.
The effectiveness of the proposed framework has been verified on the large-scale nuScenes detection benchmark.
arXiv Detail & Related papers (2021-06-23T14:53:22Z) - R-AGNO-RPN: A LIDAR-Camera Region Deep Network for Resolution-Agnostic
Detection [3.4761212729163313]
R-AGNO-RPN, a region proposal network built on fusion of 3D point clouds and RGB images is proposed.
Our approach is designed to be also applied on low point cloud resolutions.
arXiv Detail & Related papers (2020-12-10T15:22:58Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR
Segmentation [81.02742110604161]
State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern.
Our method achieves the 1st place in the leaderboard of Semantic KITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%.
arXiv Detail & Related papers (2020-11-19T18:53:11Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z) - Range Conditioned Dilated Convolutions for Scale Invariant 3D Object
Detection [41.59388513615775]
This paper presents a novel 3D object detection framework that processes LiDAR data directly on its native representation: range images.
Benefiting from the compactness of range images, 2D convolutions can efficiently process dense LiDAR data of a scene.
arXiv Detail & Related papers (2020-05-20T09:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.