LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation
- URL: http://arxiv.org/abs/2108.07511v1
- Date: Tue, 17 Aug 2021 08:53:11 GMT
- Title: LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation
- Authors: Lin Zhao, Hui Zhou, Xinge Zhu, Xiao Song, Hongsheng Li, Wenbing Tao
- Abstract summary: We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
- Score: 78.74202673902303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera and 3D LiDAR sensors have become indispensable devices in modern
autonomous driving vehicles, where the camera provides the fine-grained
texture, color information in 2D space and LiDAR captures more precise and
farther-away distance measurements of the surrounding environments. The
complementary information from these two sensors makes the two-modality fusion
be a desired option. However, two major issues of the fusion between camera and
LiDAR hinder its performance, \ie, how to effectively fuse these two modalities
and how to precisely align them (suffering from the weak spatiotemporal
synchronization problem). In this paper, we propose a coarse-to-fine LiDAR and
camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation. For the
first issue, unlike these previous works fusing the point cloud and image
information in a one-to-one manner, the proposed method fully utilizes the
contextual information of images and introduces a simple but effective
early-fusion strategy. Second, due to the weak spatiotemporal synchronization
problem, an offset rectification approach is designed to align these
two-modality features. The cooperation of these two components leads to the
success of the effective camera-LiDAR fusion. Experimental results on the
nuScenes dataset show the superiority of the proposed LIF-Seg over existing
methods with a large margin. Ablation studies and analyses demonstrate that our
proposed LIF-Seg can effectively tackle the weak spatiotemporal synchronization
problem.
Related papers
- Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object
Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects.
We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion
Transformer [14.849645397321185]
Camera radar sensors have significant advantages in cost, reliability, and maintenance compared to LiDAR.
Existing fusion methods often fuse the outputs of single modalities at the result-level, called the late fusion strategy.
Here we propose a novel proposal-level early fusion approach that effectively exploits both spatial and contextual properties of camera and radar for 3D object detection.
Our camera-radar fusion approach achieves the state-of-the-art 41.1% mAP and 52.3% NDS on the nuScenes test set, which is 8.7 and 10.8 points higher than the camera-only baseline, as well as yielding competitive performance on the
arXiv Detail & Related papers (2022-09-14T10:25:30Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.