Copy-Move Forgery Detection and Question Answering for Remote Sensing Image
- URL: http://arxiv.org/abs/2412.02575v1
- Date: Tue, 03 Dec 2024 17:02:40 GMT
- Title: Copy-Move Forgery Detection and Question Answering for Remote Sensing Image
- Authors: Ze Zhang, Enyuan Zhao, Ziyi Wan, Jie Nie, Xinyue Liang, Lei Huang,
- Abstract summary: This paper introduces the task of Remote Sensing Copy-Move Question Answering (RSCMQA)
Unlike traditional Remote Sensing Visual Question Answering (RSVQA), RSCMQA focuses on interpreting complex tampering scenarios.
We have developed an accurate and comprehensive global dataset for remote sensing image copy-move question answering, named RS-CMQA-2.1M.
- Score: 14.436863648867904
- License:
- Abstract: This paper introduces the task of Remote Sensing Copy-Move Question Answering (RSCMQA). Unlike traditional Remote Sensing Visual Question Answering (RSVQA), RSCMQA focuses on interpreting complex tampering scenarios and inferring relationships between objects. Based on the practical needs of national defense security and land resource monitoring, we have developed an accurate and comprehensive global dataset for remote sensing image copy-move question answering, named RS-CMQA-2.1M. These images were collected from 29 different regions across 14 countries. Additionally, we have refined a balanced dataset, RS-CMQA-B, to address the long-standing issue of long-tail data in the remote sensing field. Furthermore, we propose a region-discriminative guided multimodal CMQA model, which enhances the accuracy of answering questions about tampered images by leveraging prompt about the differences and connections between the source and tampered domains. Extensive experiments demonstrate that our method provides a stronger benchmark for RS-CMQA compared to general VQA and RSVQA models. Our dataset and code are available at https://github.com/shenyedepisa/RSCMQA.
Related papers
- Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection [82.65760006883248]
We introduce a new task named Change Detection Question Answering and Grounding (CDQAG)
CDQAG extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence.
We construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks.
arXiv Detail & Related papers (2024-10-31T11:20:13Z) - RS-Mamba for Large Remote Sensing Image Dense Prediction [58.12667617617306]
We propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images.
RSM is specifically designed to capture the global context of remote sensing images with linear complexity.
Our model achieves better efficiency and accuracy than transformer-based models on large remote sensing images.
arXiv Detail & Related papers (2024-04-03T12:06:01Z) - GeoChat: Grounded Large Vision-Language Model for Remote Sensing [65.78360056991247]
We propose GeoChat - the first versatile remote sensing Large Vision-Language Models (VLMs) that offers multitask conversational capabilities with high-resolution RS images.
Specifically, GeoChat can answer image-level queries but also accepts region inputs to hold region-specific dialogue.
GeoChat demonstrates robust zero-shot performance on various RS tasks, e.g., image and region captioning, visual question answering, scene classification, visually grounded conversations and referring detection.
arXiv Detail & Related papers (2023-11-24T18:59:10Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Fine-grained Late-interaction Multi-modal Retrieval for Retrieval
Augmented Visual Question Answering [56.96857992123026]
Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to utilize knowledge from external knowledge bases to answer visually-grounded questions.
This paper proposes Fine-grained Late-interaction Multi-modal Retrieval (FLMR) which significantly improves knowledge retrieval in RA-VQA.
arXiv Detail & Related papers (2023-09-29T10:54:10Z) - Visual Question Answering in Remote Sensing with Cross-Attention and
Multimodal Information Bottleneck [14.719648367178259]
We deal with the problem of visual question answering (VQA) in remote sensing.
While remotely sensed images contain information significant for the task of identification and object detection, they pose a great challenge in their processing because of high dimensionality, volume and redundancy.
We propose a cross attention based approach combined with information. The CNN-LSTM based cross-attention highlights the information in the image and language modalities and establishes a connection between the two, while information learns a low dimensional layer, that has all the relevant information required to carry out the VQA task.
arXiv Detail & Related papers (2023-06-25T15:09:21Z) - Multi-Modal Fusion Transformer for Visual Question Answering in Remote
Sensing [1.491109220586182]
VQA allows a user to formulate a free-form question concerning the content of RS images to extract generic information.
Most of the current fusion approaches use modality-specific representations in their fusion modules instead of joint representation learning.
We propose a multi-modal transformer-based architecture to overcome this issue.
arXiv Detail & Related papers (2022-10-10T09:20:33Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z) - RSVQA: Visual Question Answering for Remote Sensing Data [6.473307489370171]
This paper introduces the task of visual question answering for remote sensing data (RSVQA)
We use questions formulated in natural language and use them to interact with the images.
The datasets can be used to train (when using supervised methods) and evaluate models to solve the RSVQA task.
arXiv Detail & Related papers (2020-03-16T17:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.