DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams
- URL: http://arxiv.org/abs/2411.17771v1
- Date: Tue, 26 Nov 2024 08:27:50 GMT
- Title: DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams
- Authors: Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Jun Liu,
- Abstract summary: We introduce DiagramQG, a dataset containing 8,372 diagrams and 19,475 questions across various subjects.
We present the Hierarchical Knowledge Integration framework for Diagram Question Generation (HKI-DQG) as a strong baseline.
We evaluate the performance of existing VQG models, open-source and closed-source vision-language models, and HKI-DQG on the DiagramQG dataset.
- Score: 19.310782704527192
- License:
- Abstract: Visual Question Generation (VQG) has gained significant attention due to its potential in educational applications. However, VQG researches mainly focus on natural images, neglecting diagrams in educational materials used to assess students' conceptual understanding. To address this gap, we introduce DiagramQG, a dataset containing 8,372 diagrams and 19,475 questions across various subjects. DiagramQG introduces concept and target text constraints, guiding the model to generate concept-focused questions for educational purposes. Meanwhile, we present the Hierarchical Knowledge Integration framework for Diagram Question Generation (HKI-DQG) as a strong baseline. This framework obtains multi-scale patches of diagrams and acquires knowledge using a visual language model with frozen parameters. It then integrates knowledge, text constraints and patches to generate concept-focused questions. We evaluate the performance of existing VQG models, open-source and closed-source vision-language models, and HKI-DQG on the DiagramQG dataset. Our HKI-DQG outperform existing methods, demonstrating that it serves as a strong baseline. Furthermore, to assess its generalizability, we apply HKI-DQG to two other VQG datasets of natural images, namely VQG-COCO and K-VQG, achieving state-of-the-art performance.The dataset and code are available at https://dxzxy12138.github.io/diagramqg-home.
Related papers
- ConVQG: Contrastive Visual Question Generation with Multimodal Guidance [20.009626292937995]
We propose Contrastive Visual Question Generation (ConVQG) to generate image-grounded, text-guided, and knowledge-rich questions.
Experiments on knowledge-aware and standard VQG benchmarks demonstrate that ConVQG outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-02-20T09:20:30Z) - Language Guided Visual Question Answering: Elevate Your Multimodal
Language Model Using Knowledge-Enriched Prompts [54.072432123447854]
Visual question answering (VQA) is the task of answering questions about an image.
Answering the question requires commonsense knowledge, world knowledge, and reasoning about ideas and concepts not present in the image.
We propose a framework that uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately.
arXiv Detail & Related papers (2023-10-31T03:54:11Z) - Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation [64.64849950642619]
We develop an evaluation framework inspired by formal semantics for evaluating text-to-image models.
We show that Davidsonian Scene Graph (DSG) produces atomic and unique questions organized in dependency graphs.
We also present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts.
arXiv Detail & Related papers (2023-10-27T16:20:10Z) - Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets [5.45761450227064]
We propose a new Few-Shot Visual Question Generation (FS-VQG) task and provide a comprehensive benchmark to it.
We evaluate various existing VQG approaches as well as popular few-shot solutions based on meta-learning and self-supervised strategies for the FS-VQG task.
Several important findings emerge from our experiments, that shed light on the limits of current models in few-shot vision and language generation tasks.
arXiv Detail & Related papers (2022-10-13T15:01:15Z) - VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks
for Visual Question Answering [79.22069768972207]
We propose VQA-GNN, a new VQA model that performs bidirectional fusion between unstructured and structured multimodal knowledge to obtain unified knowledge representations.
Specifically, we inter-connect the scene graph and the concept graph through a super node that represents the QA context.
On two challenging VQA tasks, our method outperforms strong baseline VQA methods by 3.2% on VCR and 4.6% on GQA, suggesting its strength in performing concept-level reasoning.
arXiv Detail & Related papers (2022-05-23T17:55:34Z) - K-VQG: Knowledge-aware Visual Question Generation for Common-sense
Acquisition [64.55573343404572]
We present a novel knowledge-aware VQG dataset called K-VQG.
This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge.
We also develop a new VQG model that can encode and use knowledge as the target for a question.
arXiv Detail & Related papers (2022-03-15T13:38:10Z) - QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question
Answering [122.84513233992422]
We propose a new model, QA-GNN, which addresses the problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs)
We show its improvement over existing LM and LM+KG models, as well as its capability to perform interpretable and structured reasoning.
arXiv Detail & Related papers (2021-04-13T17:32:51Z) - EQG-RACE: Examination-Type Question Generation [21.17100754955864]
We propose an innovative Examination-type Question Generation approach (EQG-RACE) to generate exam-like questions based on a dataset extracted from RACE.
Two main strategies are employed in EQG-RACE for dealing with discrete answer information and reasoning among long contexts.
Experimental results show a state-of-the-art performance of EQG-RACE, which is apparently superior to the baselines.
arXiv Detail & Related papers (2020-12-11T03:52:17Z) - Cross-modal Knowledge Reasoning for Knowledge-based Visual Question
Answering [27.042604046441426]
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image.
In this paper, we depict an image by multiple knowledge graphs from the visual, semantic and factual views.
We decompose the model into a series of memory-based reasoning steps, each performed by a G raph-based R ead, U pdate, and C ontrol.
We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA.
arXiv Detail & Related papers (2020-08-31T23:25:01Z) - Toward Subgraph-Guided Knowledge Graph Question Generation with Graph
Neural Networks [53.58077686470096]
Knowledge graph (KG) question generation (QG) aims to generate natural language questions from KGs and target answers.
In this work, we focus on a more realistic setting where we aim to generate questions from a KG subgraph and target answers.
arXiv Detail & Related papers (2020-04-13T15:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.