SAR Strikes Back: A New Hope for RSVQA
- URL: http://arxiv.org/abs/2501.08131v2
- Date: Fri, 08 Aug 2025 15:09:52 GMT
- Title: SAR Strikes Back: A New Hope for RSVQA
- Authors: Lucrezia Tosato, Flora Weissgerber, Laurent Wendling, Sylvain Lobry,
- Abstract summary: Remote Sensing Visual Question Answering (RSVQA) is a task that extracts information from satellite images to answer questions in natural language.<n>We present a dataset that enables SAR-based RSVQA and explore two pipelines for the task.
- Score: 1.6249398255272318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote Sensing Visual Question Answering (RSVQA) is a task that extracts information from satellite images to answer questions in natural language, aiding image interpretation. While several methods exist for optical images with varying spectral bands and resolutions, only recently have high-resolution Synthetic Aperture Radar (SAR) images been explored. SAR's ability to operate in all weather conditions and capture electromagnetic features makes it a promising modality, yet no study has compared SAR and optical imagery in RSVQA or proposed effective fusion strategies. This work investigates how to integrate SAR data into RSVQA and how to best combine it with optical images. We present a dataset that enables SAR-based RSVQA and explore two pipelines for the task. The first is an end-to-end model, while the second is a two-stage framework: SAR information is first extracted and translated into text, which is then processed by a language model to produce the final answer. Our results show that the two-stage model performs better, improving accuracy by nearly 10% over the end-to-end approach. We also evaluate fusion strategies for combining SAR and optical data. A decision-level fusion yields the best results, with an F1-micro score of 75.00%, F1-average of 81.21%, and overall accuracy of 75.49% on the proposed dataset. SAR proves especially beneficial for questions related to specific land cover types, such as water areas, demonstrating its value as a complementary modality to optical imagery.
Related papers
- SARCLIP: A Vision Language Foundation Model for Semantic Understanding and Target Recognition in SAR Imagery [46.87845911116779]
We introduce SARCLIP, the first vision language foundation model tailored for the SAR domain.<n>SARCLIP is trained using a contrastive vision language learning approach by domain transferring strategy.<n>Experiments on image-text retrieval and zero-shot classification tasks demonstrate the superior performance of SARCLIP.
arXiv Detail & Related papers (2025-10-26T13:04:50Z) - SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding [20.314150537672198]
Vision-Language Models (VLMs) have demonstrated remarkable success in RGB image understanding, offering powerful open-vocabulary interpretation and flexible language interaction.<n>We introduce SARLANG-1M, a large-scale benchmark tailored for multimodal SAR image understanding, with a primary focus on integrating SAR with textual modality.<n>It features hierarchical resolutions (ranging from 0.1 to 25 meters), fine-grained semantic descriptions (including both concise and detailed captions), diverse remote sensing categories, and multi-task question-answering pairs spanning seven applications and 1,012 question types.
arXiv Detail & Related papers (2025-04-04T08:09:53Z) - Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering [26.8129265632403]
Current Remote Sensing Visual Question Answering (RSVQA) methods are limited by the imaging mechanisms of optical sensors.<n>We propose a Text-guided Coarse-to-Fine Fusion Network (TGFNet) to improve RSVQA performance.<n>We create the first large-scale benchmark dataset for evaluating optical-SAR RSVQA methods.
arXiv Detail & Related papers (2024-11-24T09:48:03Z) - A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning [10.748210940033484]
Current visual question answering (VQA) tasks often require constructing multimodal datasets and fine-tuning visual language models.
This letter proposes a novel VQA approach that integrates object detection networks with visual language models.
This integration aims to enhance the capabilities of VQA systems, focusing on aspects such as ship location, density, and size analysis.
arXiv Detail & Related papers (2024-11-03T06:03:39Z) - Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR)
With this, we propose an automated image evaluation pipeline.
We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z) - Deep Learning Based Speckle Filtering for Polarimetric SAR Images. Application to Sentinel-1 [51.404644401997736]
We propose a complete framework to remove speckle in polarimetric SAR images using a convolutional neural network.
Experiments show that the proposed approach offers exceptional results in both speckle reduction and resolution preservation.
arXiv Detail & Related papers (2024-08-28T10:07:17Z) - Can SAR improve RSVQA performance? [1.6249398255272318]
We study whether Synthetic Aperture Radar (SAR) images can be beneficial to this field.
We investigate the classification results of SAR alone and investigate the best method to extract information from SAR data.
In the last phase, we investigate how SAR images and a combination of different modalities behave in RSVQA compared to a method only using optical images.
arXiv Detail & Related papers (2024-08-28T08:53:20Z) - Conditional Brownian Bridge Diffusion Model for VHR SAR to Optical Image Translation [5.578820789388206]
This letter introduces a conditional image-to-image translation approach based on Brownian Bridge Diffusion Model (BBDM)<n>We conducted comprehensive experiments on the MSAW dataset, a paired SAR and optical images collection of 0.5m Very-High-Resolution (VHR)
arXiv Detail & Related papers (2024-08-15T05:43:46Z) - SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection.
Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets.
To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z) - SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image
Classification [1.2349871196144497]
Convolutional neural networks (CNNs) play a crucial role in capturing PolSAR image characteristics.
In this study, a novel three-branch fusion of complex-valued CNN, named the Shallow to Deep Feature Fusion Network (SDF2Net), is proposed for PolSAR image classification.
The results indicate that the proposed approach demonstrates improvements in overallaccuracy, with a 1.3% and 0.8% enhancement for the AIRSAR datasets and a 0.5% improvement for the ESAR dataset.
arXiv Detail & Related papers (2024-02-27T16:46:21Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Confusing Image Quality Assessment: Towards Better Augmented Reality
Experience [96.29124666702566]
We consider AR technology as the superimposition of virtual scenes and real scenes, and introduce visual confusion as its basic theory.
A ConFusing Image Quality Assessment (CFIQA) database is established, which includes 600 reference images and 300 distorted images generated by mixing reference images in pairs.
An objective metric termed CFIQA is also proposed to better evaluate the confusing image quality.
arXiv Detail & Related papers (2022-04-11T07:03:06Z) - Rethinking Drone-Based Search and Rescue with Aerial Person Detection [79.76669658740902]
The visual inspection of aerial drone footage is an integral part of land search and rescue (SAR) operations today.
We propose a novel deep learning algorithm to automate this aerial person detection (APD) task.
We present the novel Aerial Inspection RetinaNet (AIR) algorithm as the combination of these contributions.
arXiv Detail & Related papers (2021-11-17T21:48:31Z) - PeaceGAN: A GAN-based Multi-Task Learning Method for SAR Target Image
Generation with a Pose Estimator and an Auxiliary Classifier [50.17500790309477]
We propose a novel GAN-based multi-task learning (MTL) method for SAR target image generation, called PeaceGAN.
PeaceGAN uses both pose angle and target class information, which makes it possible to produce SAR target images of desired target classes at intended pose angles.
arXiv Detail & Related papers (2021-03-29T10:03:09Z) - The QXS-SAROPT Dataset for Deep Learning in SAR-Optical Data Fusion [14.45289690639374]
We publish the QXS-SAROPT dataset to foster deep learning research in SAR-optical data fusion.
We show exemplary results for two representative applications, namely SAR-optical image matching and SAR ship detection boosted by cross-modal information from optical images.
arXiv Detail & Related papers (2021-03-15T10:22:46Z) - Lightweight Single-Image Super-Resolution Network with Attentive
Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR.
Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z) - SAR2SAR: a semi-supervised despeckling algorithm for SAR images [3.9490074068698]
Deep learning algorithm with self-supervision is proposed in this paper: SAR2SAR.
The strategy to adapt it to SAR despeckling is presented, based on a compensation of temporal changes and a loss function adapted to the statistics of speckle.
Results on real images are discussed, to show the potential of the proposed algorithm.
arXiv Detail & Related papers (2020-06-26T15:07:28Z) - On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews,
Guidances and Million-AID [57.71601467271486]
This article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation.
We first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations.
Following the presented guidances, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset.
arXiv Detail & Related papers (2020-06-22T17:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.