DUFormer: Solving Power Line Detection Task in Aerial Images using
Semantic Segmentation
- URL: http://arxiv.org/abs/2304.05821v2
- Date: Thu, 31 Aug 2023 14:15:51 GMT
- Title: DUFormer: Solving Power Line Detection Task in Aerial Images using
Semantic Segmentation
- Authors: Deyu An, Qiang Zhang, Jianshu Chao, Ting Li, Feng Qiao, Yong Deng,
Zhenpeng Bian
- Abstract summary: Unmanned aerial vehicles (UAVs) are frequently used for inspecting power lines and capturing high-resolution aerial images.
To tackle this problem, we introduce DUFormer, a semantic segmentation algorithm explicitly designed to detect power lines in aerial images.
Our proposed method outperforms all state-of-the-art methods in power line segmentation on the publicly accessible TTPLA dataset.
- Score: 17.77548837421917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unmanned aerial vehicles (UAVs) are frequently used for inspecting power
lines and capturing high-resolution aerial images. However, detecting power
lines in aerial images is difficult,as the foreground data(i.e, power lines) is
small and the background information is abundant.To tackle this problem, we
introduce DUFormer, a semantic segmentation algorithm explicitly designed to
detect power lines in aerial images. We presuppose that it is advantageous to
train an efficient Transformer model with sufficient feature extraction using a
convolutional neural network(CNN) with a strong inductive bias.With this goal
in mind, we introduce a heavy token encoder that performs overlapping feature
remodeling and tokenization. The encoder comprises a pyramid CNN feature
extraction module and a power line feature enhancement module.After successful
local feature extraction for power lines, feature fusion is conducted.Then,the
Transformer block is used for global modeling. The final segmentation result is
achieved by amalgamating local and global features in the decode head.Moreover,
we demonstrate the importance of the joint multi-weight loss function in power
line segmentation. Our experimental results show that our proposed method
outperforms all state-of-the-art methods in power line segmentation on the
publicly accessible TTPLA dataset.
Related papers
- CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer [8.962657021133925]
Cross-scale transformer (CT) processes feature representations at different stages without additional computation.
We introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales.
We also present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction.
arXiv Detail & Related papers (2023-12-14T01:33:18Z) - Effective Image Tampering Localization via Enhanced Transformer and
Co-attention Fusion [5.691973573807887]
We propose an effective image tampering localization network (EITLNet) based on a two-branch enhanced transformer encoder.
The features extracted from RGB and noise streams are fused effectively by the coordinate attention-based fusion module.
arXiv Detail & Related papers (2023-09-17T15:43:06Z) - ABC: Attention with Bilinear Correlation for Infrared Small Target
Detection [4.7379300868029395]
CNN based deep learning methods are not effective at segmenting infrared small target (IRST)
We propose a new model called attention with bilinear correlation (ABC)
ABC is based on the transformer architecture and includes a convolution linear fusion transformer (CLFT) module with a novel attention mechanism for feature extraction and fusion.
arXiv Detail & Related papers (2023-03-18T03:47:06Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Semantic Labeling of High Resolution Images Using EfficientUNets and
Transformers [5.177947445379688]
We propose a new segmentation model that combines convolutional neural networks with deep transformers.
Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
arXiv Detail & Related papers (2022-06-20T12:03:54Z) - An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector.
ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector.
We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z) - PLGAN: Generative Adversarial Networks for Power-Line Segmentation in
Aerial Images [15.504887854179666]
PLGAN is a simple yet effective method to segment power lines from aerial images with different backgrounds.
We exploit the appropriate form of the generated images for high-quality feature embedding.
Our proposed PLGAN outperforms the prior state-of-the-art methods for semantic segmentation and line detection.
arXiv Detail & Related papers (2022-04-14T21:43:31Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.