SAR Strikes Back: A New Hope for RSVQA
- URL: http://arxiv.org/abs/2501.08131v1
- Date: Tue, 14 Jan 2025 14:07:48 GMT
- Title: SAR Strikes Back: A New Hope for RSVQA
- Authors: Lucrezia Tosato, Flora Weissgerber, Laurent Wendling, Sylvain Lobry,
- Abstract summary: We present a dataset that allows for the introduction of SAR images in the RSVQA framework.
SAR images capture electromagnetic information from the scene, and are less affected by atmospheric conditions, such as clouds.
We show that SAR data offers additional information when fused with the optical modality, particularly for questions related to specific land cover classes, such as water areas.
- Score: 1.6249398255272318
- License:
- Abstract: Remote sensing visual question answering (RSVQA) is a task that automatically extracts information from satellite images and processes a question to predict the answer from the images in textual form, helping with the interpretation of the image. While different methods have been proposed to extract information from optical images with different spectral bands and resolutions, no method has been proposed to answer questions from Synthetic Aperture Radar (SAR) images. SAR images capture electromagnetic information from the scene, and are less affected by atmospheric conditions, such as clouds. In this work, our objective is to introduce SAR in the RSVQA task, finding the best way to use this modality. In our research, we carry out a study on different pipelines for the task of RSVQA taking into account information from both SAR and optical data. To this purpose, we also present a dataset that allows for the introduction of SAR images in the RSVQA framework. We propose two different models to include the SAR modality. The first one is an end-to-end method in which we add an additional encoder for the SAR modality. In the second approach, we build on a two-stage framework. First, relevant information is extracted from SAR and, optionally, optical data. This information is then translated into natural language to be used in the second step which only relies on a language model to provide the answer. We find that the second pipeline allows us to obtain good results with SAR images alone. We then try various types of fusion methods to use SAR and optical images together, finding that a fusion at the decision level achieves the best results on the proposed dataset. We show that SAR data offers additional information when fused with the optical modality, particularly for questions related to specific land cover classes, such as water areas.
Related papers
- Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering [26.8129265632403]
Current Remote Sensing Visual Question Answering (RSVQA) methods are limited by the imaging mechanisms of optical sensors.
We propose a Text-guided Coarse-to-Fine Fusion Network (TGFNet) to improve RSVQA performance.
We create the first large-scale benchmark dataset for evaluating optical-SAR RSVQA methods.
arXiv Detail & Related papers (2024-11-24T09:48:03Z) - A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning [10.748210940033484]
Current visual question answering (VQA) tasks often require constructing multimodal datasets and fine-tuning visual language models.
This letter proposes a novel VQA approach that integrates object detection networks with visual language models.
This integration aims to enhance the capabilities of VQA systems, focusing on aspects such as ship location, density, and size analysis.
arXiv Detail & Related papers (2024-11-03T06:03:39Z) - Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR)
With this, we propose an automated image evaluation pipeline.
We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z) - Deep Learning Based Speckle Filtering for Polarimetric SAR Images. Application to Sentinel-1 [51.404644401997736]
We propose a complete framework to remove speckle in polarimetric SAR images using a convolutional neural network.
Experiments show that the proposed approach offers exceptional results in both speckle reduction and resolution preservation.
arXiv Detail & Related papers (2024-08-28T10:07:17Z) - Can SAR improve RSVQA performance? [1.6249398255272318]
We study whether Synthetic Aperture Radar (SAR) images can be beneficial to this field.
We investigate the classification results of SAR alone and investigate the best method to extract information from SAR data.
In the last phase, we investigate how SAR images and a combination of different modalities behave in RSVQA compared to a method only using optical images.
arXiv Detail & Related papers (2024-08-28T08:53:20Z) - SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection.
Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets.
To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - PeaceGAN: A GAN-based Multi-Task Learning Method for SAR Target Image
Generation with a Pose Estimator and an Auxiliary Classifier [50.17500790309477]
We propose a novel GAN-based multi-task learning (MTL) method for SAR target image generation, called PeaceGAN.
PeaceGAN uses both pose angle and target class information, which makes it possible to produce SAR target images of desired target classes at intended pose angles.
arXiv Detail & Related papers (2021-03-29T10:03:09Z) - The QXS-SAROPT Dataset for Deep Learning in SAR-Optical Data Fusion [14.45289690639374]
We publish the QXS-SAROPT dataset to foster deep learning research in SAR-optical data fusion.
We show exemplary results for two representative applications, namely SAR-optical image matching and SAR ship detection boosted by cross-modal information from optical images.
arXiv Detail & Related papers (2021-03-15T10:22:46Z) - SAR2SAR: a semi-supervised despeckling algorithm for SAR images [3.9490074068698]
Deep learning algorithm with self-supervision is proposed in this paper: SAR2SAR.
The strategy to adapt it to SAR despeckling is presented, based on a compensation of temporal changes and a loss function adapted to the statistics of speckle.
Results on real images are discussed, to show the potential of the proposed algorithm.
arXiv Detail & Related papers (2020-06-26T15:07:28Z) - On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews,
Guidances and Million-AID [57.71601467271486]
This article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation.
We first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations.
Following the presented guidances, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset.
arXiv Detail & Related papers (2020-06-22T17:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.