DDU-Net: Dual-Decoder-U-Net for Road Extraction Using High-Resolution
Remote Sensing Images
- URL: http://arxiv.org/abs/2201.06750v1
- Date: Tue, 18 Jan 2022 05:27:49 GMT
- Title: DDU-Net: Dual-Decoder-U-Net for Road Extraction Using High-Resolution
Remote Sensing Images
- Authors: Ying Wang, Yuexing Peng, Xinran Liu, Wei Li, George C.
Alexandropoulos, Junchuan Yu, Daqing Ge, Wei Xiang
- Abstract summary: An enhanced deep neural network model termed Dual-Decoder-U-Net (DDU-Net) is proposed in this paper.
The proposed model outperforms the state-of-the-art DenseUNet, DeepLabv3+ and D-LinkNet by 6.5%, 3.3%, and 2.1% in the mean Intersection over Union (mIoU) and by 4%, 4.8%, and 3.1% in the F1 score, respectively.
- Score: 19.07341794770722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting roads from high-resolution remote sensing images (HRSIs) is vital
in a wide variety of applications, such as autonomous driving, path planning,
and road navigation. Due to the long and thin shape as well as the shades
induced by vegetation and buildings, small-sized roads are more difficult to
discern. In order to improve the reliability and accuracy of small-sized road
extraction when roads of multiple sizes coexist in an HRSI, an enhanced deep
neural network model termed Dual-Decoder-U-Net (DDU-Net) is proposed in this
paper. Motivated by the U-Net model, a small decoder is added to form a
dual-decoder structure for more detailed features. In addition, we introduce
the dilated convolution attention module (DCAM) between the encoder and
decoders to increase the receptive field as well as to distill multi-scale
features through cascading dilated convolution and global average pooling. The
convolutional block attention module (CBAM) is also embedded in the parallel
dilated convolution and pooling branches to capture more attention-aware
features. Extensive experiments are conducted on the Massachusetts Roads
dataset with experimental results showing that the proposed model outperforms
the state-of-the-art DenseUNet, DeepLabv3+ and D-LinkNet by 6.5%, 3.3%, and
2.1% in the mean Intersection over Union (mIoU), and by 4%, 4.8%, and 3.1% in
the F1 score, respectively. Both ablation and heatmap analyses are presented to
validate the effectiveness of the proposed model.
Related papers
- UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving [47.590099762244535]
Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks.
This research delves into multi-modal Masked Autoencoders tailored for a unified representation space in autonomous driving.
To intricately marry the semantics inherent in images with the geometric intricacies of LiDAR point clouds, we propose UniM$2$AE.
arXiv Detail & Related papers (2023-08-21T02:13:40Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D
Object Detection [17.526914782562528]
We propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign.
Our best model reaches 72.4 NDS on nuScenes test leaderboard, achieving new state-of-the-art results.
arXiv Detail & Related papers (2022-07-21T06:17:23Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Spatio-Contextual Deep Network Based Multimodal Pedestrian Detection For
Autonomous Driving [1.2599533416395765]
This paper proposes an end-to-end multimodal fusion model for pedestrian detection using RGB and thermal images.
Its novel deep network architecture is capable of exploiting multimodal input efficiently.
The results on each of them improved the respective state-the-art performance.
arXiv Detail & Related papers (2021-05-26T17:50:36Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z) - Binary DAD-Net: Binarized Driveable Area Detection Network for
Autonomous Driving [94.40107679615618]
This paper proposes a novel binarized driveable area detection network (binary DAD-Net)
It uses only binary weights and activations in the encoder, the bottleneck, and the decoder part.
It outperforms state-of-the-art semantic segmentation networks on public datasets.
arXiv Detail & Related papers (2020-06-15T07:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.