Evaluating the Fitness of Ontologies for the Task of Question Generation
- URL: http://arxiv.org/abs/2504.07994v2
- Date: Wed, 27 Aug 2025 17:47:44 GMT
- Title: Evaluating the Fitness of Ontologies for the Task of Question Generation
- Authors: Samah Alkhuzaey, Floriana Grasso, Terry R. Payne, Valentina Tamma,
- Abstract summary: This paper proposes a set of requirements and task-specific metrics for evaluating the fitness of question generation tasks.<n>A set of evaluation metrics have been derived from an expert assessment of questions generated by a question generation model.<n>The analysis confirms that characteristics significantly impact the effectiveness of question generation, with different exhibiting varying performance levels.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ontology-based question generation is an important application of semantic-aware systems that enables the creation of large question banks for diverse learning environments. The effectiveness of these systems, both in terms of the calibre and cognitive difficulty of the resulting questions, depends heavily on the quality and modelling approach of the underlying ontologies, making it crucial to assess their fitness for this task. To date, there has been no comprehensive investigation into the specific ontology aspects or characteristics that affect the question generation process. Therefore, this paper proposes a set of requirements and task-specific metrics for evaluating the fitness of ontologies for question generation tasks in pedagogical settings. Using the ROMEO methodology (a structured framework used for identifying task-specific metrics), a set of evaluation metrics have been derived from an expert assessment of questions generated by a question generation model. To validate the proposed metrics, we apply them to a set of ontologies previously used in question generation to illustrate how the metric scores align with and complement findings reported in earlier studies. The analysis confirms that ontology characteristics significantly impact the effectiveness of question generation, with different ontologies exhibiting varying performance levels. This highlights the importance of assessing ontology quality with respect to Automatic Question Generation (AQG) tasks.
Related papers
- Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models [6.0158981171030685]
Critical Questions Generation (CQs-Gen) aims to foster critical thinking by enabling systems to generate questions that expose underlying assumptions.<n>Despite growing interest in this area, progress has been hindered by the lack of suitable datasets and automatic evaluation standards.<n>This paper presents a comprehensive approach to support the development and benchmarking of systems for this task.
arXiv Detail & Related papers (2025-05-16T15:08:04Z) - "There Is No Such Thing as a Dumb Question," But There Are Good Ones [4.962252439662465]
This study defines good questions and presents a systematic evaluation framework.<n>We propose two key evaluation dimensions: appropriateness (sociolinguistic competence in context) and effectiveness.<n>By incorporating dynamic contextual variables, our evaluation framework achieves structure and flexibility through semi-adaptive criteria.
arXiv Detail & Related papers (2025-05-15T03:12:28Z) - Pitfalls of topology-aware image segmentation [81.19923502845441]
We identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts, and inappropriate use of evaluation metrics.<n>We propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.
arXiv Detail & Related papers (2024-12-19T08:11:42Z) - UKTF: Unified Knowledge Tracing Framework for Subjective and Objective Assessments [3.378008889662775]
Knowledge tracing technology can establish knowledge state models based on learners' historical answer data.
This study proposes a unified knowledge tracing model that integrates both objective and subjective test questions.
arXiv Detail & Related papers (2024-11-08T04:58:19Z) - AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs [53.6200736559742]
AGENT-CQ consists of two stages: a generation stage and an evaluation stage.
CrowdLLM simulates human crowdsourcing judgments to assess generated questions and answers.
Experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality.
arXiv Detail & Related papers (2024-10-25T17:06:27Z) - An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms [62.878616839799776]
We propose SynthRAG, an innovative framework designed to enhance Question Answering (QA) performance.
SynthRAG improves on conventional models by employing adaptive outlines for dynamic content structuring.
An online deployment on the Zhihu platform revealed that SynthRAG's answers achieved notable user engagement.
arXiv Detail & Related papers (2024-10-23T09:14:57Z) - StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs)
We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.
Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z) - A RAG Approach for Generating Competency Questions in Ontology Engineering [1.0044270899550196]
With the emergence of Large Language Models (LLMs), there arises the possibility to automate and enhance this process.<n>We present a retrieval-augmented generation (RAG) approach that uses LLMs for the automatic generation of CQs.<n>We conduct experiments using GPT-4 on two domain engineering tasks and compare results against ground-truth CQs constructed by domain experts.
arXiv Detail & Related papers (2024-09-13T13:34:32Z) - Multi-Faceted Question Complexity Estimation Targeting Topic Domain-Specificity [0.0]
This paper presents a novel framework for domain-specific question difficulty estimation, leveraging a suite of NLP techniques and knowledge graph analysis.
We introduce four key parameters: Topic Retrieval Cost, Topic Salience, Topic Coherence, and Topic Superficiality.
A model trained on these features demonstrates the efficacy of our approach in predicting question difficulty.
arXiv Detail & Related papers (2024-08-23T05:40:35Z) - Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM's Structured Questions for National Teacher Certification Exams [2.7363336723930756]
This study explores the application potential of the large language models (LLMs) ChatGLM in the automatic generation of structured questions for National Teacher Certification Exams (NTCE)
We guided ChatGLM to generate a series of simulated questions and conducted a comprehensive comparison with questions recollected from past examinees.
The research results indicate that the questions generated by ChatGLM exhibit a high level of rationality, scientificity, and practicality similar to those of the real exam questions.
arXiv Detail & Related papers (2024-08-19T13:32:14Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires.
We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents.
Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z) - Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models.
Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems.
To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z) - Improving Visual Question Answering Models through Robustness Analysis
and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models.
arXiv Detail & Related papers (2023-04-06T15:32:35Z) - Latent Properties of Lifelong Learning Systems [59.50307752165016]
We introduce an algorithm-agnostic explainable surrogate-modeling approach to estimate latent properties of lifelong learning algorithms.
We validate the approach for estimating these properties via experiments on synthetic data.
arXiv Detail & Related papers (2022-07-28T20:58:13Z) - What should I Ask: A Knowledge-driven Approach for Follow-up Questions
Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys.
We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge.
We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.