SCTransNet: Spatial-channel Cross Transformer Network for Infrared Small Target Detection
- URL: http://arxiv.org/abs/2401.15583v3
- Date: Tue, 30 Apr 2024 09:40:01 GMT
- Title: SCTransNet: Spatial-channel Cross Transformer Network for Infrared Small Target Detection
- Authors: Shuai Yuan, Hanlin Qin, Xiang Yan, Naveed AKhtar, Ajmal Mian,
- Abstract summary: Infrared small target detection (IRSTD) has recently benefitted greatly from U-shaped neural models.
Existing techniques struggle when the target has high similarities with the background.
We present a Spatial-channel Cross Transformer Network (SCTransNet) that leverages spatial-channel cross transformer blocks.
- Score: 46.049401912285134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Infrared small target detection (IRSTD) has recently benefitted greatly from U-shaped neural models. However, largely overlooking effective global information modeling, existing techniques struggle when the target has high similarities with the background. We present a Spatial-channel Cross Transformer Network (SCTransNet) that leverages spatial-channel cross transformer blocks (SCTBs) on top of long-range skip connections to address the aforementioned challenge. In the proposed SCTBs, the outputs of all encoders are interacted with cross transformer to generate mixed features, which are redistributed to all decoders to effectively reinforce semantic differences between the target and clutter at full scales. Specifically, SCTB contains the following two key elements: (a) spatial-embedded single-head channel-cross attention (SSCA) for exchanging local spatial features and full-level global channel information to eliminate ambiguity among the encoders and facilitate high-level semantic associations of the images, and (b) a complementary feed-forward network (CFN) for enhancing the feature discriminability via a multi-scale strategy and cross-spatial-channel information interaction to promote beneficial information transfer. Our SCTransNet effectively encodes the semantic differences between targets and backgrounds to boost its internal representation for detecting small infrared targets accurately. Extensive experiments on three public datasets, NUDT-SIRST, NUAA-SIRST, and IRSTD-1k, demonstrate that the proposed SCTransNet outperforms existing IRSTD methods. Our code will be made public at https://github.com/xdFai.
Related papers
- LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections [27.473573286685063]
We propose a remote-sensing image semantic segmentation network named LKASeg.
LKASeg combines Large Kernel Attention(LSKA) and Full-Scale Skip Connections(FSC)
On the ISPRS Vaihingen dataset, the mF1 and mIoU scores achieved 90.33% and 82.77%.
arXiv Detail & Related papers (2024-10-14T12:25:48Z) - Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images [5.652171904017473]
Object detection in aerial images has always been a challenging task due to the generally small size of the objects.
Most current detectors prioritize novel detection frameworks, often overlooking research on fundamental components such as feature pyramid networks.
We introduce the Cross-Layer Feature Pyramid Transformer (CFPT), a novel upsampler-free feature pyramid network designed specifically for small object detection in aerial images.
arXiv Detail & Related papers (2024-07-29T04:40:18Z) - Relating CNN-Transformer Fusion Network for Change Detection [23.025190360146635]
RCTNet introduces an early fusion backbone to exploit both spatial and temporal features.
Experiments demonstrate RCTNet's clear superiority over traditional RS image CD methods.
arXiv Detail & Related papers (2024-07-03T14:58:40Z) - TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising [94.09442506816724]
Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID)
We present a transformer-based blind-spot network (TBSN) by analyzing and redesigning the transformer operators that meet the blind-spot requirement.
For spatial self-attention, an elaborate mask is applied to the attention matrix to restrict its receptive field, thus mimicking the dilated convolution.
For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures.
arXiv Detail & Related papers (2024-04-11T15:39:10Z) - Dual-Augmented Transformer Network for Weakly Supervised Semantic
Segmentation [4.02487511510606]
Weakly supervised semantic segmentation (WSSS) is a fundamental computer vision task, which aims to segment out the object within only class-level labels.
Traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions.
An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information.
We propose a dual network with both CNN-based and transformer networks for mutually complementary learning.
arXiv Detail & Related papers (2023-09-30T08:41:11Z) - Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z) - Dual Aggregation Transformer for Image Super-Resolution [92.41781921611646]
We propose a novel Transformer model, Dual Aggregation Transformer, for image SR.
Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner.
Our experiments show that our DAT surpasses current methods.
arXiv Detail & Related papers (2023-08-07T07:39:39Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient
object detection [12.126413875108993]
We propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection.
The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets.
arXiv Detail & Related papers (2022-04-12T07:37:39Z) - Infrared Small-Dim Target Detection with Transformer under Complex
Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer.
We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.
We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.