Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
Local Information
- URL: http://arxiv.org/abs/2204.09860v1
- Date: Thu, 21 Apr 2022 03:18:09 GMT
- Title: Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
Local Information
- Authors: Zhiqiang Yuan, Wenkai Zhang, Changyuan Tian, Xuee Rong, Zhengyuan
Zhang, Hongqi Wang, Kun Fu, and Xian Sun
- Abstract summary: Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images.
We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels.
Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
- Score: 15.32353270625554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become
an urgent research hotspot due to its ability of enabling fast and flexible
information extraction on remote sensing (RS) images. However, current RSCTIR
methods mainly focus on global features of RS images, which leads to the
neglect of local features that reflect target relationships and saliency. In
this article, we first propose a novel RSCTIR framework based on global and
local information (GaLR), and design a multi-level information dynamic fusion
(MIDF) module to efficaciously integrate features of different levels. MIDF
leverages local information to correct global information, utilizes global
information to supplement local information, and uses the dynamic addition of
the two to generate prominent visual representation. To alleviate the pressure
of the redundant targets on the graph convolution network (GCN) and to improve
the model s attention on salient instances during modeling local features, the
de-noised representation matrix and the enhanced adjacency matrix (DREA) are
devised to assist GCN in producing superior local representations. DREA not
only filters out redundant features with high similarity, but also obtains more
powerful local features by enhancing the features of prominent objects.
Finally, to make full use of the information in the similarity matrix during
inference, we come up with a plug-and-play multivariate rerank (MR) algorithm.
The algorithm utilizes the k nearest neighbors of the retrieval results to
perform a reverse search, and improves the performance by combining multiple
components of bidirectional retrieval. Extensive experiments on public datasets
strongly demonstrate the state-of-the-art performance of GaLR methods on the
RSCTIR task. The code of GaLR method, MR algorithm, and corresponding files
have been made available at https://github.com/xiaoyuan1996/GaLR .
Related papers
- Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval [16.995114000869833]
We propose CMPAGL, a cross-modal pre-aligned method leveraging global and local information.
Our Gswin transformer block combines local window self-attention and global-local window cross-attention to capture multi-scale features.
Experiments on four datasets, including RSICD and RSITMD, validate CMPAGL's effectiveness.
arXiv Detail & Related papers (2024-11-22T03:28:55Z) - United Domain Cognition Network for Salient Object Detection in Optical Remote Sensing Images [21.76732661032257]
We propose a novel United Domain Cognition Network (UDCNet) to jointly explore the global-local information in the frequency and spatial domains.
Experimental results demonstrate the superiority of the proposed UDCNet over 24 state-of-the-art models.
arXiv Detail & Related papers (2024-11-11T04:12:27Z) - Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network [2.028685490378346]
We propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information.
Experiments on two large-scale remote sensing datasets demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches.
arXiv Detail & Related papers (2024-04-02T03:36:07Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z) - RRSIS: Referring Remote Sensing Image Segmentation [25.538406069768662]
Localizing desired objects from remote sensing images is of great use in practical applications.
Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images.
We introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations.
arXiv Detail & Related papers (2023-06-14T16:40:19Z) - DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks
for Image Super-Resolution [83.47467223117361]
We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution.
Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently.
To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values.
arXiv Detail & Related papers (2023-01-05T12:06:47Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.