SkipcrossNets: Adaptive Skip-cross Fusion for Road Detection
- URL: http://arxiv.org/abs/2308.12863v2
- Date: Thu, 18 Jul 2024 10:13:24 GMT
- Title: SkipcrossNets: Adaptive Skip-cross Fusion for Road Detection
- Authors: Yan Gong, Xinyu Zhang, Hao Liu, Xinmin Jiang, Zhiwei Li, Xin Gao, Lei Lin, Dafeng Jin, Jun Li, Huaping Liu,
- Abstract summary: Multi-modal fusion is increasingly being used for autonomous driving tasks.
In this study, we propose a novel fusion architecture called Skip-cross Networks (SkipcrossNets)
The advantages of skip-cross fusion strategy is demonstrated through application to the KITTI and A2D2 datasets.
- Score: 25.94434460779164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal fusion is increasingly being used for autonomous driving tasks, as different modalities provide unique information for feature extraction. However, the existing two-stream networks are only fused at a specific network layer, which requires a lot of manual attempts to set up. As the CNN goes deeper, the two modal features become more and more advanced and abstract, and the fusion occurs at the feature level with a large gap, which can easily hurt the performance. To reduce the loss of height and depth information during the process of projecting point clouds into 2D space, we utilize calibration parameters to project the point cloud into Altitude Difference Images (ADIs), which exhibit more distinct road features. In this study, we propose a novel fusion architecture called Skip-cross Networks (SkipcrossNets), which combine adaptively ADIs and camera images without being bound to a certain fusion epoch. Specifically, skip-cross fusion strategy connects each layer to each layer in a feed-forward manner, and for each layer, the feature maps of all previous layers are used as input and its own feature maps are used as input to all subsequent layers for the other modality, enhancing feature propagation and multi-modal features fusion. This strategy facilitates selection of the most similar feature layers from two modalities, enhancing feature reuse and providing complementary effects for sparse point cloud features. The advantages of skip-cross fusion strategy is demonstrated through application to the KITTI and A2D2 datasets, achieving a MaxF score of 96.85% on KITTI and an F1 score of 84.84% on A2D2. The model parameters require only 2.33 MB of memory at a speed of 68.24 FPS, which can be viable for mobile terminals and embedded devices.
Related papers
- ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification [5.863175733097434]
We propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet) to address the issue of asymmetry at the feature level.
The proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences.
We have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%.
arXiv Detail & Related papers (2024-12-03T00:03:33Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - Bilateral Network with Residual U-blocks and Dual-Guided Attention for
Real-time Semantic Segmentation [18.393208069320362]
We design a new fusion mechanism for two-branch architecture which is guided by attention computation.
To be precise, we use the Dual-Guided Attention (DGA) module we proposed to replace some multi-scale transformations.
Experiments on Cityscapes and CamVid dataset show the effectiveness of our method.
arXiv Detail & Related papers (2023-10-31T09:20:59Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow [88.97790684009979]
A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation.
We propose a Flow Alignment Module (FAM) to learn textitSemantic Flow between feature maps of adjacent levels.
We also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps.
arXiv Detail & Related papers (2022-07-10T08:25:47Z) - Interactive Multi-scale Fusion of 2D and 3D Features for Multi-object
Tracking [23.130490413184596]
We introduce PointNet++ to obtain multi-scale deep representations of point cloud to make it adaptive to our proposed Interactive Feature Fusion.
Our method can achieve good performance on the KITTI benchmark and outperform other approaches without using multi-scale feature fusion.
arXiv Detail & Related papers (2022-03-30T13:00:27Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Spatio-Contextual Deep Network Based Multimodal Pedestrian Detection For
Autonomous Driving [1.2599533416395765]
This paper proposes an end-to-end multimodal fusion model for pedestrian detection using RGB and thermal images.
Its novel deep network architecture is capable of exploiting multimodal input efficiently.
The results on each of them improved the respective state-the-art performance.
arXiv Detail & Related papers (2021-05-26T17:50:36Z) - FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point
Cloud Segmentation [30.736361776703568]
Scene understanding based on LiDAR point cloud is an essential task for autonomous cars to drive safely.
Most existing methods simply stack different point attributes/modalities as image channels to increase information capacity.
We design FPS-Net, a convolutional fusion network that exploits the uniqueness and discrepancy among the projected image channels for optimal point cloud segmentation.
arXiv Detail & Related papers (2021-03-01T04:08:28Z) - Lightweight Single-Image Super-Resolution Network with Attentive
Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR.
Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.