Deep Sensor Fusion with Pyramid Fusion Networks for 3D Semantic
Segmentation
- URL: http://arxiv.org/abs/2205.13629v1
- Date: Thu, 26 May 2022 20:57:19 GMT
- Title: Deep Sensor Fusion with Pyramid Fusion Networks for 3D Semantic
Segmentation
- Authors: Hannah Schieber, Fabian Duerr, Torsten Schoen and J\"urgen Beyerer
- Abstract summary: This work presents a pyramid-based deep fusion architecture for lidar and camera to improve 3D semantic segmentation of traffic scenes.
A novel Pyramid Fusion Backbone fuses feature maps at different scales to compute valuable multimodal, multi-scale features.
The approach is evaluated on two challenging outdoor datasets and different fusion strategies and setups are investigated.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust environment perception for autonomous vehicles is a tremendous
challenge, which makes a diverse sensor set with e.g. camera, lidar and radar
crucial. In the process of understanding the recorded sensor data, 3D semantic
segmentation plays an important role. Therefore, this work presents a
pyramid-based deep fusion architecture for lidar and camera to improve 3D
semantic segmentation of traffic scenes. Individual sensor backbones extract
feature maps of camera images and lidar point clouds. A novel Pyramid Fusion
Backbone fuses these feature maps at different scales and combines the
multimodal features in a feature pyramid to compute valuable multimodal,
multi-scale features. The Pyramid Fusion Head aggregates these pyramid features
and further refines them in a late fusion step, incorporating the final
features of the sensor backbones. The approach is evaluated on two challenging
outdoor datasets and different fusion strategies and setups are investigated.
It outperforms recent range view based lidar approaches as well as all so far
proposed fusion strategies and architectures.
Related papers
- Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV)
We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels.
Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z) - Multi-scale Feature Fusion with Point Pyramid for 3D Object Detection [18.41721888099563]
This paper proposes the Point Pyramid RCNN (POP-RCNN), a feature pyramid-based framework for 3D object detection on point clouds.
The proposed method can be applied to a variety of existing frameworks to increase feature richness, especially for long-distance detection.
arXiv Detail & Related papers (2024-09-06T20:13:14Z) - FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.
Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.
In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z) - PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery [30.522327480291295]
We propose a novel Mamba-based segmentation network, namely PyramidMamba.
Specifically, we design a dense spatial pyramid pooling (DSPP) to encode rich multi-scale semantic features and a pyramid fusion Mamba (PFM) to reduce semantic redundancy in multi-scale feature fusion.
Our PyramidMamba yields state-of-the-art performance on three publicly available datasets.
arXiv Detail & Related papers (2024-06-16T07:43:40Z) - DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras
and Radars [2.2166853714891057]
We propose a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection.
Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible.
Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach.
arXiv Detail & Related papers (2022-09-26T14:33:30Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection [6.959556180268547]
This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection.
A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
arXiv Detail & Related papers (2022-08-25T13:21:37Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth
Estimation [81.08111209632501]
We propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation.
We exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space.
Our network achieves state-of-the-art performance on the KITTI and the Virtual- KITTI datasets.
arXiv Detail & Related papers (2021-03-24T03:24:46Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.