Related papers: DiagramQG: Concept-Focused Diagram Question Generation via Hierarchical Knowledge Integration

DiagramQG: Concept-Focused Diagram Question Generation via Hierarchical Knowledge Integration

URL: http://arxiv.org/abs/2411.17771v3
Date: Mon, 10 Mar 2025 07:48:31 GMT
Title: DiagramQG: Concept-Focused Diagram Question Generation via Hierarchical Knowledge Integration
Authors: Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Basura Fernando, Jun Liu,
Abstract summary: We construct a dataset containing 8,372 diagrams and 19,475 questions across various subjects.<n>We present the Hierarchical Knowledge Integration framework for Diagram Question Generation (HKI-DQG) as a strong baseline.<n>We evaluate the performance of existing VQG models, open-source and closed-source vision-language models, and HKI-DQG on the DiagramQG dataset.
Score: 27.549301875569736
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Visual Question Generation (VQG) has gained significant attention due to its potential in educational applications. However, VQG research mainly focuses on natural images, largely neglecting diagrams in educational materials used to assess students' conceptual understanding. To address this gap, we construct DiagramQG, a dataset containing 8,372 diagrams and 19,475 questions across various subjects. DiagramQG introduces concept and target text constraints, guiding the model to generate concept-focused questions for educational purposes. Meanwhile, we present the Hierarchical Knowledge Integration framework for Diagram Question Generation (HKI-DQG) as a strong baseline. This framework obtains multi-scale patches of diagrams and acquires knowledge using a visual language model with frozen parameters. It then integrates knowledge, text constraints, and patches to generate concept-focused questions. We evaluate the performance of existing VQG models, open-source and closed-source vision-language models, and HKI-DQG on the DiagramQG dataset. Our novel HKI-DQG consistently outperforms existing methods, demonstrating that it serves as a strong baseline. Furthermore, we apply HKI-DQG to four other VQG datasets of natural images, namely VQG-COCO, K-VQG, OK-VQA, and A-OKVQA, achieving state-of-the-art performance.

Related papers

Question-Aware Knowledge Graph Prompting for Enhancing Large Language Models [51.47994645529258]
We propose Question-Aware Knowledge Graph Prompting (QAP), which incorporates question embeddings into GNN aggregation to dynamically assess KG relevance.<n> Experimental results demonstrate that QAP outperforms state-of-the-art methods across multiple datasets, highlighting its effectiveness.
arXiv Detail & Related papers (2025-03-30T17:09:11Z)
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction [11.575505739575023]
We propose a unified framework for constructing generalized knowledge graphs.<n>First, we collect data from 15 sub-tasks in 29 datasets across the three types of graphs.<n>Then, we propose a three-stage curriculum learning fine-tuning framework, by iteratively injecting knowledge from the three types of graphs into the Large Language Models.
arXiv Detail & Related papers (2025-03-14T09:23:22Z)
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees [50.78679002846741]
We propose a novel approach to cross-task generalization in graphs via task-trees.<n>We show that pretraining a graph neural network (GNN) on diverse task-trees with a reconstruction objective induces transferable knowledge.<n>This enables efficient adaptation to downstream tasks with minimal fine-tuning.
arXiv Detail & Related papers (2024-12-21T02:07:43Z)
Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature [33.69273440337546]
We introduce a new benchmark, Scientific Chart QA (SCI-CQA), which emphasizes flowcharts as a critical yet often overlooked category.<n>We curated a dataset of 202,760 image-text pairs from 15 top-tier computer science conferences papers over the past decade.<n>SCI-CQA also introduces a novel evaluation framework inspired by human exams, encompassing 5,629 carefully curated questions.
arXiv Detail & Related papers (2024-12-11T05:29:54Z)
A Survey on Neural Question Generation: Methods, Applications, and Prospects [56.97451350691765]
The survey begins with an overview of NQG's background, encompassing the task's problem formulation. It then methodically classifies NQG approaches into three predominant categories: structured NQG, unstructured NQG, and hybrid NQG. The survey culminates with a forward-looking perspective on the trajectory of NQG, identifying emergent research trends and prospective developmental paths.
arXiv Detail & Related papers (2024-02-28T11:57:12Z)
ConVQG: Contrastive Visual Question Generation with Multimodal Guidance [20.009626292937995]
We propose Contrastive Visual Question Generation (ConVQG) to generate image-grounded, text-guided, and knowledge-rich questions. Experiments on knowledge-aware and standard VQG benchmarks demonstrate that ConVQG outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-02-20T09:20:30Z)
Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts [54.072432123447854]
Visual question answering (VQA) is the task of answering questions about an image. Answering the question requires commonsense knowledge, world knowledge, and reasoning about ideas and concepts not present in the image. We propose a framework that uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately.
arXiv Detail & Related papers (2023-10-31T03:54:11Z)
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation [64.64849950642619]
We develop an evaluation framework inspired by formal semantics for evaluating text-to-image models. We show that Davidsonian Scene Graph (DSG) produces atomic and unique questions organized in dependency graphs. We also present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts.
arXiv Detail & Related papers (2023-10-27T16:20:10Z)
Knowledge Graph for NLG in the context of conversational agents [0.0]
We provide a review of different architectures used for knowledge graph-to-text generation including: Graph Neural Networks, the Graph Transformer, and linearization with seq2seq models. We aim to refine benchmark datasets of kg-to-text generation on PLMs and to explore the emotional and multilingual dimensions in our future work.
arXiv Detail & Related papers (2023-07-04T08:03:33Z)
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets [5.45761450227064]
We propose a new Few-Shot Visual Question Generation (FS-VQG) task and provide a comprehensive benchmark to it. We evaluate various existing VQG approaches as well as popular few-shot solutions based on meta-learning and self-supervised strategies for the FS-VQG task. Several important findings emerge from our experiments, that shed light on the limits of current models in few-shot vision and language generation tasks.
arXiv Detail & Related papers (2022-10-13T15:01:15Z)
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering [79.22069768972207]
We propose VQA-GNN, a new VQA model that performs bidirectional fusion between unstructured and structured multimodal knowledge to obtain unified knowledge representations. Specifically, we inter-connect the scene graph and the concept graph through a super node that represents the QA context. On two challenging VQA tasks, our method outperforms strong baseline VQA methods by 3.2% on VCR and 4.6% on GQA, suggesting its strength in performing concept-level reasoning.
arXiv Detail & Related papers (2022-05-23T17:55:34Z)
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition [64.55573343404572]
We present a novel knowledge-aware VQG dataset called K-VQG. This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge. We also develop a new VQG model that can encode and use knowledge as the target for a question.
arXiv Detail & Related papers (2022-03-15T13:38:10Z)
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning [132.49090098391258]
We introduce a new challenge of Icon Question Answering (IconQA) with the goal of answering a question in an icon image context. We release IconQA, a large-scale dataset that consists of 107,439 questions and three sub-tasks: multi-image-choice, multi-text-choice, and filling-in-the-blank. We further release an icon dataset Icon645 which contains 645,687 colored icons on 377 classes.
arXiv Detail & Related papers (2021-10-25T18:52:26Z)
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering [122.84513233992422]
We propose a new model, QA-GNN, which addresses the problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) We show its improvement over existing LM and LM+KG models, as well as its capability to perform interpretable and structured reasoning.
arXiv Detail & Related papers (2021-04-13T17:32:51Z)
Understanding the Role of Scene Graphs in Visual Question Answering [26.02889386248289]
We conduct experiments on the GQA dataset which presents a challenging set of questions requiring counting, compositionality and advanced reasoning capability. We adopt image + question architectures for use with scene graphs, evaluate various scene graph generation techniques for unseen images, propose a training curriculum to leverage human-annotated and auto-generated scene graphs. We present a multi-faceted study into the use of scene graphs for Visual Question Answering, making this work the first of its kind.
arXiv Detail & Related papers (2021-01-14T07:27:37Z)
EQG-RACE: Examination-Type Question Generation [21.17100754955864]
We propose an innovative Examination-type Question Generation approach (EQG-RACE) to generate exam-like questions based on a dataset extracted from RACE. Two main strategies are employed in EQG-RACE for dealing with discrete answer information and reasoning among long contexts. Experimental results show a state-of-the-art performance of EQG-RACE, which is apparently superior to the baselines.
arXiv Detail & Related papers (2020-12-11T03:52:17Z)
Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering [27.042604046441426]
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image. In this paper, we depict an image by multiple knowledge graphs from the visual, semantic and factual views. We decompose the model into a series of memory-based reasoning steps, each performed by a G raph-based R ead, U pdate, and C ontrol. We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA.
arXiv Detail & Related papers (2020-08-31T23:25:01Z)
Semantic Graphs for Generating Deep Questions [98.5161888878238]
We propose a novel framework which first constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN) On the HotpotQA deep-question centric dataset, our model greatly improves performance over questions requiring reasoning over multiple facts, leading to state-of-the-art performance.
arXiv Detail & Related papers (2020-04-27T10:52:52Z)
Toward Subgraph-Guided Knowledge Graph Question Generation with Graph Neural Networks [53.58077686470096]
Knowledge graph (KG) question generation (QG) aims to generate natural language questions from KGs and target answers. In this work, we focus on a more realistic setting where we aim to generate questions from a KG subgraph and target answers.
arXiv Detail & Related papers (2020-04-13T15:43:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.