D3FNet: A Differential Attention Fusion Network for Fine-Grained Road Structure Extraction in Remote Perception Systems
- URL: http://arxiv.org/abs/2508.15537v1
- Date: Thu, 21 Aug 2025 13:12:20 GMT
- Title: D3FNet: A Differential Attention Fusion Network for Fine-Grained Road Structure Extraction in Remote Perception Systems
- Authors: Chang Liu, Yang Xu, Tamas Sziranyi,
- Abstract summary: D3FNet is a Dual-Stream Differential Attention Fusion Network designed for fine-grained road structure segmentation in remote perception systems.<n>DADE module enhances subtle road features while suppressing background noise at the bottleneck.<n>DDFM integrates original and attention-modulated features to balance spatial precision with semantic context.
- Score: 5.350820088672045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting narrow roads from high-resolution remote sensing imagery remains a significant challenge due to their limited width, fragmented topology, and frequent occlusions. To address these issues, we propose D3FNet, a Dilated Dual-Stream Differential Attention Fusion Network designed for fine-grained road structure segmentation in remote perception systems. Built upon the encoder-decoder backbone of D-LinkNet, D3FNet introduces three key innovations:(1) a Differential Attention Dilation Extraction (DADE) module that enhances subtle road features while suppressing background noise at the bottleneck; (2) a Dual-stream Decoding Fusion Mechanism (DDFM) that integrates original and attention-modulated features to balance spatial precision with semantic context; and (3) a multi-scale dilation strategy (rates 1, 3, 5, 9) that mitigates gridding artifacts and improves continuity in narrow road prediction. Unlike conventional models that overfit to generic road widths, D3FNet specifically targets fine-grained, occluded, and low-contrast road segments. Extensive experiments on the DeepGlobe and CHN6-CUG benchmarks show that D3FNet achieves superior IoU and recall on challenging road regions, outperforming state-of-the-art baselines. Ablation studies further verify the complementary synergy of attention-guided encoding and dual-path decoding. These results confirm D3FNet as a robust solution for fine-grained narrow road extraction in complex remote and cooperative perception scenarios.
Related papers
- TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder [66.22997415145467]
This paper presents a joint completion and detection framework that improves the detection feature in sparse areas.<n> Specifically, we propose TransBridge, a novel transformer-based up-sampling block that fuses the features from the detection and completion networks.<n>The results show that our framework consistently improves end-to-end 3D object detection, with the mean average precision (mAP) ranging from 0.7 to 1.5 across multiple methods.
arXiv Detail & Related papers (2025-12-12T00:08:03Z) - CMF-IoU: Multi-Stage Cross-Modal Fusion 3D Object Detection with IoU Joint Prediction [29.7092783661859]
Multi-modal methods based on camera and LiDAR sensors have garnered significant attention in the field of 3D detection.<n>We introduce a multi-stage cross-modal fusion 3D detection framework, termed CMF-IOU, to address the challenge of aligning 3D spatial and 2D semantic information.
arXiv Detail & Related papers (2025-08-18T13:32:07Z) - DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion [9.225796678303487]
We propose DualDiff, a dual-branch conditional diffusion model designed to enhance multi-view driving scene generation.<n>We introduce Occupancy Ray Sampling (ORS), a semantic-rich 3D representation, alongside numerical driving scene representation.<n>To improve cross-modal information integration, we propose a Semantic Fusion Attention (SFA) mechanism that aligns and fuses features across modalities.
arXiv Detail & Related papers (2025-05-03T16:20:01Z) - Learning to Align and Refine: A Foundation-to-Diffusion Framework for Occlusion-Robust Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures.<n>Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts.<n>We propose a dual-stage Foundation-to-Diffusion framework that precisely align 2D prior guidance from vision foundation models.
arXiv Detail & Related papers (2025-03-22T14:42:27Z) - DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance [5.113012982922924]
We present DualDiff, a conditional diffusion model designed to enhance driving scene generation across multiple views and video sequences.<n>To improve synthesis of fine-grained foreground objects, we propose a Foreground-Aware Mask (FGM) denoising loss function.<n>We also develop the Semantic Fusion Attention (SFA) mechanism to dynamically prioritize relevant information and suppress noise.
arXiv Detail & Related papers (2025-03-05T17:31:45Z) - Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.<n>Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.<n>We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - DDU-Net: Dual-Decoder-U-Net for Road Extraction Using High-Resolution
Remote Sensing Images [19.07341794770722]
An enhanced deep neural network model termed Dual-Decoder-U-Net (DDU-Net) is proposed in this paper.
The proposed model outperforms the state-of-the-art DenseUNet, DeepLabv3+ and D-LinkNet by 6.5%, 3.3%, and 2.1% in the mean Intersection over Union (mIoU) and by 4%, 4.8%, and 3.1% in the F1 score, respectively.
arXiv Detail & Related papers (2022-01-18T05:27:49Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Road Network Guided Fine-Grained Urban Traffic Flow Inference [108.64631590347352]
Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem.
We propose a novel Road-Aware Traffic Flow Magnifier (RATFM) that exploits the prior knowledge of road networks.
Our method can generate high-quality fine-grained traffic flow maps.
arXiv Detail & Related papers (2021-09-29T07:51:49Z) - DiResNet: Direction-aware Residual Network for Road Extraction in VHR
Remote Sensing Images [12.081877372552606]
We present a direction-aware residual network (DiResNet) that includes three main contributions.
The proposed method has advantages in both overall accuracy and F1-score.
arXiv Detail & Related papers (2020-05-14T19:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.