Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection
- URL: http://arxiv.org/abs/2410.23828v2
- Date: Wed, 13 Nov 2024 09:06:18 GMT
- Title: Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection
- Authors: Ke Li, Fuyu Dong, Di Wang, Shaofeng Li, Quan Wang, Xinbo Gao, Tat-Seng Chua,
- Abstract summary: We introduce a new task named Change Detection Question Answering and Grounding (CDQAG)
CDQAG extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence.
We construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks.
- Score: 82.65760006883248
- License:
- Abstract: Remote sensing change detection aims to perceive changes occurring on the Earth's surface from remote sensing data in different periods, and feed these changes back to humans. However, most existing methods only focus on detecting change regions, lacking the capability to interact with users to identify changes that the users expect. In this paper, we introduce a new task named Change Detection Question Answering and Grounding (CDQAG), which extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence. To this end, we construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks. It encompasses 10 essential land-cover categories and 8 comprehensive question types, which provides a valuable and diverse dataset for remote sensing applications. Furthermore, we present VisTA, a simple yet effective baseline method that unifies the tasks of question answering and grounding by delivering both visual and textual answers. Our method achieves state-of-the-art results on both the classic change detection-based visual question answering (CDVQA) and the proposed CDQAG datasets. Extensive qualitative and quantitative experimental results provide useful insights for developing better CDQAG models, and we hope that our work can inspire further research in this important yet underexplored research field. The proposed benchmark dataset and method are available at https://github.com/like413/VisTA.
Related papers
- A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning [9.786907179872815]
The potential of vision and language remains underexplored in face forgery detection.
There is a need for a methodology that converts face forgery detection to a Visual Question Answering (VQA) task.
We propose a multi-staged approach that diverges from the traditional binary decision paradigm to address this gap.
arXiv Detail & Related papers (2024-10-01T08:16:40Z) - ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection [16.62779899494721]
Change detection (CD) is a fundamental task in remote sensing (RS) which aims to detect the semantic changes between the same geographical regions at different time stamps.
We propose an effective Siamese-based framework to encode the semantic changes occurring in the bi-temporal RS images.
arXiv Detail & Related papers (2024-04-26T17:47:14Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Change Detection Methods for Remote Sensing in the Last Decade: A
Comprehensive Review [45.78958623050146]
Change detection is an essential and widely utilized task in remote sensing.
It aims to detect and analyze changes occurring in the same geographical area over time.
Deep learning has emerged as a powerful tool for feature extraction and addressing these challenges.
arXiv Detail & Related papers (2023-05-09T23:52:37Z) - Weakly Supervised Grounding for VQA in Vision-Language Transformers [112.5344267669495]
This paper focuses on the problem of weakly supervised grounding in context of visual question answering in transformers.
The approach leverages capsules by grouping each visual token in the visual encoder.
We evaluate our approach on the challenging GQA as well as VQA-HAT dataset for VQA grounding.
arXiv Detail & Related papers (2022-07-05T22:06:03Z) - Change Detection Meets Visual Question Answering [23.63790450326685]
We introduce a novel task: change detection-based visual question answering (CDVQA) on multi-temporal aerial images.
In particular, multi-temporal images can be queried to obtain high level change-based information according to content changes between two input images.
A baseline CDVQA framework is devised in this work, and it contains four parts: multi-temporal feature encoding, multi-temporal fusion, multi-modal fusion, and answer prediction.
arXiv Detail & Related papers (2021-12-12T22:39:20Z) - Unsupervised Domain Adaption of Object Detectors: A Survey [87.08473838767235]
Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications.
Learning highly accurate models relies on the availability of datasets with a large number of annotated images.
Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images.
arXiv Detail & Related papers (2021-05-27T23:34:06Z) - Found a Reason for me? Weakly-supervised Grounded Visual Question
Answering using Capsules [85.98177341704675]
The problem of grounding VQA tasks has seen an increased attention in the research community recently.
We propose a visual capsule module with a query-based selection mechanism of capsule features.
We show that integrating the proposed capsule module in existing VQA systems significantly improves their performance on the weakly supervised grounding task.
arXiv Detail & Related papers (2021-05-11T07:45:32Z) - DASNet: Dual attentive fully convolutional siamese networks for change
detection of high resolution satellite images [17.839181739760676]
The research objective is to identity the change information of interest and filter out the irrelevant change information as interference factors.
Recently, the rise of deep learning has provided new tools for change detection, which have yielded impressive results.
We propose a new method, namely, dual attentive fully convolutional Siamese networks (DASNet) for change detection in high-resolution images.
arXiv Detail & Related papers (2020-03-07T16:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.