Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer
- URL: http://arxiv.org/abs/2309.08206v1
- Date: Fri, 15 Sep 2023 07:14:43 GMT
- Title: Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer
- Authors: Gongyang Li and Zhen Bai and Zhi Liu and Xinpeng Zhang and Haibin Ling
- Abstract summary: We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
- Score: 69.22039680783124
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing methods for Salient Object Detection in Optical Remote Sensing
Images (ORSI-SOD) mainly adopt Convolutional Neural Networks (CNNs) as the
backbone, such as VGG and ResNet. Since CNNs can only extract features within
certain receptive fields, most ORSI-SOD methods generally follow the
local-to-contextual paradigm. In this paper, we propose a novel Global
Extraction Local Exploration Network (GeleNet) for ORSI-SOD following the
global-to-local paradigm. Specifically, GeleNet first adopts a transformer
backbone to generate four-level feature embeddings with global long-range
dependencies. Then, GeleNet employs a Direction-aware Shuffle Weighted Spatial
Attention Module (D-SWSAM) and its simplified version (SWSAM) to enhance local
interactions, and a Knowledge Transfer Module (KTM) to further enhance
cross-level contextual interactions. D-SWSAM comprehensively perceives the
orientation information in the lowest-level features through directional
convolutions to adapt to various orientations of salient objects in ORSIs, and
effectively enhances the details of salient objects with an improved attention
mechanism. SWSAM discards the direction-aware part of D-SWSAM to focus on
localizing salient objects in the highest-level features. KTM models the
contextual correlation knowledge of two middle-level features of different
scales based on the self-attention mechanism, and transfers the knowledge to
the raw features to generate more discriminative features. Finally, a saliency
predictor is used to generate the saliency map based on the outputs of the
above three modules. Extensive experiments on three public datasets demonstrate
that the proposed GeleNet outperforms relevant state-of-the-art methods. The
code and results of our method are available at
https://github.com/MathLee/GeleNet.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Multi-Scale Direction-Aware Network for Infrared Small Target Detection [2.661766509317245]
Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target.
We propose a multi-scale direction-aware network (MSDA-Net) to integrate the high-frequency directional features of infrared small targets.
MSDA-Net achieves state-of-the-art (SOTA) results on the public NUDT-SIRST, SIRST and IRSTD-1k datasets.
arXiv Detail & Related papers (2024-06-04T07:23:09Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient
Object Detection [22.60675416709486]
M$3$Net is an attention network for Salient Object Detection.
Cross-attention approach to achieve the interaction between multilevel features.
Mixed Attention Block aims at modeling context at both global and local levels.
Multilevel supervision strategy to optimize the aggregated feature stage-by-stage.
arXiv Detail & Related papers (2023-09-15T12:46:14Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - GroupTransNet: Group Transformer Network for RGB-D Salient Object
Detection [5.876499671899904]
We propose a novel Group Transformer Network (GroupTransNet) for RGB-D salient object detection.
GroupTransNet is good at learning the long-range dependencies of cross layer features.
Experiments demonstrate that GroupTransNet outperforms comparison models.
arXiv Detail & Related papers (2022-03-21T08:00:16Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.