Related papers: Knowledge-aware Visual Question Generation for Remote Sensing Images

Knowledge-aware Visual Question Generation for Remote Sensing Images

URL: http://arxiv.org/abs/2602.19224v1
Date: Sun, 22 Feb 2026 15:18:01 GMT
Title: Knowledge-aware Visual Question Generation for Remote Sensing Images
Authors: Siran Li, Li Mi, Javiera Castillo-Navarro, Devis Tuia,
Abstract summary: We propose a knowledge-aware remote sensing visual question generation model, KRSVQG.<n>The model takes an image and a related knowledge triplet from external knowledge sources as inputs.<n>Results on two datasets demonstrate that KRSVQG outperforms existing methods.
Score: 18.383561647568502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid development of remote sensing image archives, asking questions about images has become an effective way of gathering specific information or performing image retrieval. However, automatically generated image-based questions tend to be simplistic and template-based, which hinders the real deployment of question answering or visual dialogue systems. To enrich and diversify the questions, we propose a knowledge-aware remote sensing visual question generation model, KRSVQG, that incorporates external knowledge related to the image content to improve the quality and contextual understanding of the generated questions. The model takes an image and a related knowledge triplet from external knowledge sources as inputs and leverages image captioning as an intermediary representation to enhance the image grounding of the generated questions. To assess the performance of KRSVQG, we utilized two datasets that we manually annotated: NWPU-300 and TextRS-300. Results on these two datasets demonstrate that KRSVQG outperforms existing methods and leads to knowledge-enriched questions, grounded in both image and domain knowledge.

Related papers

Questions beyond Pixels: Integrating Commonsense Knowledge in Visual Question Generation for Remote Sensing [18.383561647568502]
We propose a Knowledge-aware Remote Sensing Visual Question Generation model (KRSVQG)<n>The proposed model incorporates related knowledge triplets from external knowledge sources to broaden the question content.<n>KRSVQG utilizes a vision-language pre-training and fine-tuning strategy, enabling the model's adaptation to low data regimes.
arXiv Detail & Related papers (2026-02-22T14:59:00Z)
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model [16.343935641777268]
We propose a novel Remote Sensing Retrieval-Augmented Generation (RS-RAG) framework, which consists of two key components.<n>The RS-RAG framework retrieves relevant knowledge based on image and/or text queries, and incorporates the retrieved content into a knowledge-augmented prompt.<n>We validated the effectiveness of our approach on three representative vision-language tasks, including image captioning, image classification, and visual question answering, where RS-RAG significantly outperformed state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T12:13:43Z)
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference. We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z)
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge [10.074327344317116]
We propose Q&A Prompts to equip AI models with robust cross-modality reasoning ability. We first use the image-answer pairs and the corresponding questions in a training set as inputs and outputs to train a visual question generation model. We then use an image tagging model to identify various instances and send packaged image-tag pairs into the visual question generation model to generate relevant questions with the extracted image tags as answers.
arXiv Detail & Related papers (2024-01-19T14:22:29Z)
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering [79.22069768972207]
We propose VQA-GNN, a new VQA model that performs bidirectional fusion between unstructured and structured multimodal knowledge to obtain unified knowledge representations. Specifically, we inter-connect the scene graph and the concept graph through a super node that represents the QA context. On two challenging VQA tasks, our method outperforms strong baseline VQA methods by 3.2% on VCR and 4.6% on GQA, suggesting its strength in performing concept-level reasoning.
arXiv Detail & Related papers (2022-05-23T17:55:34Z)
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition [64.55573343404572]
We present a novel knowledge-aware VQG dataset called K-VQG. This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge. We also develop a new VQG model that can encode and use knowledge as the target for a question.
arXiv Detail & Related papers (2022-03-15T13:38:10Z)
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering [18.926582410644375]
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to correctly answer image-related questions. We propose a novel model named dynamic knowledge memory enhanced multi-step graph reasoning (DMMGR) Our model achieves new state-of-the-art accuracy on the KRVQR and FVQA datasets.
arXiv Detail & Related papers (2022-03-06T15:19:39Z)
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA [51.639880603821446]
We propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions for knowledge-based VQA. We first convert the image into captions (or tags) that GPT-3 can understand, then adapt GPT-3 to solve the VQA task in a few-shot manner. By using only 16 examples, PICa surpasses the supervised state of the art by an absolute +8.6 points on the OK-VQA dataset.
arXiv Detail & Related papers (2021-09-10T17:51:06Z)
Contextualized Knowledge-aware Attentive Neural Network: Enhancing Answer Selection with Knowledge [77.77684299758494]
We extensively investigate approaches to enhancing the answer selection model with external knowledge from knowledge graph (KG) First, we present a context-knowledge interaction learning framework, Knowledge-aware Neural Network (KNN), which learns the QA sentence representations by considering a tight interaction with the external knowledge from KG and the textual information. To handle the diversity and complexity of KG information, we propose a Contextualized Knowledge-aware Attentive Neural Network (CKANN), which improves the knowledge representation learning with structure information via a customized Graph Convolutional Network (GCN) and comprehensively learns context-based and knowledge-based sentence representation via
arXiv Detail & Related papers (2021-04-12T05:52:20Z)
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image. In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time. We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z)
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation. We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.