Related papers: Multiple-Choice Question Generation: Towards an Automated Assessment Framework

Multiple-Choice Question Generation: Towards an Automated Assessment Framework

URL: http://arxiv.org/abs/2209.11830v1
Date: Fri, 23 Sep 2022 19:51:46 GMT
Title: Multiple-Choice Question Generation: Towards an Automated Assessment Framework
Authors: Vatsal Raina and Mark Gales
Abstract summary: transformer-based pretrained language models have demonstrated the ability to produce appropriate questions from a context paragraph. We focus on a fully automated multiple-choice question generation (MCQG) system where both the question and possible answers must be generated from the context paragraph.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated question generation is an important approach to enable personalisation of English comprehension assessment. Recently, transformer-based pretrained language models have demonstrated the ability to produce appropriate questions from a context paragraph. Typically, these systems are evaluated against a reference set of manually generated questions using n-gram based metrics, or manual qualitative assessment. Here, we focus on a fully automated multiple-choice question generation (MCQG) system where both the question and possible answers must be generated from the context paragraph. Applying n-gram based approaches is challenging for this form of system as the reference set is unlikely to capture the full range of possible questions and answer options. Conversely manual assessment scales poorly and is expensive for MCQG system development. In this work, we propose a set of performance criteria that assess different aspects of the generated multiple-choice questions of interest. These qualities include: grammatical correctness, answerability, diversity and complexity. Initial systems for each of these metrics are described, and individually evaluated on standard multiple-choice reading comprehension corpora.

Related papers

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation [69.81654421834989]
We introduce Auto, an agentic framework that automatically converts open-ended questions into multiple-choice format. Our experiments demonstrate that Auto can correct and challenging multiple-choice questions, with similar or lower accuracy compared to human-created ones. We comprehensively evaluate 33 state-of-the-art vision language models (VLMs) on VMCBench, setting a new standard for scalable, consistent, and reproducible VLM evaluation.
arXiv Detail & Related papers (2025-01-06T18:57:31Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions. To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model. mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z)
Automating question generation from educational text [1.9325905076281444]
The use of question-based activities (QBAs) is wide-spread in education, forming an integral part of the learning and assessment process. We design and evaluate an automated question generation tool for formative and summative assessment in schools.
arXiv Detail & Related papers (2023-09-26T15:18:44Z)
SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation) We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z)
An Empirical Comparison of LM-based Question and Answer Generation Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context. In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning. Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z)
SkillQG: Learning to Generate Question for Reading Comprehension Assessment [54.48031346496593]
We present a question generation framework with controllable comprehension types for assessing and improving machine reading comprehension models. We first frame the comprehension type of questions based on a hierarchical skill-based schema, then formulate $textttSkillQG$ as a skill-conditioned question generator. Empirical results demonstrate that $textttSkillQG$ outperforms baselines in terms of quality, relevance, and skill-controllability.
arXiv Detail & Related papers (2023-05-08T14:40:48Z)
Discourse Analysis via Questions and Answers: Parsing Dependency Structures of Questions Under Discussion [57.43781399856913]
This work adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis. We characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained questions. We develop the first-of-its-kind QUD that derives a dependency structure of questions over full documents.
arXiv Detail & Related papers (2022-10-12T03:53:12Z)
Evaluation of Question Answering Systems: Complexity of judging a natural language [3.4771957347698583]
Question answering (QA) systems are among the most important and rapidly developing research topics in natural language processing (NLP) This survey attempts to provide a systematic overview of the general framework of QA, QA paradigms, benchmark datasets, and assessment techniques for a quantitative evaluation of QA systems.
arXiv Detail & Related papers (2022-09-10T12:29:04Z)
Evaluating Mixed-initiative Conversational Search Systems via User Simulation [9.066817876491053]
We propose a conversational User Simulator, called USi, for automatic evaluation of such search systems. We show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers.
arXiv Detail & Related papers (2022-04-17T16:27:33Z)
Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document. We show that readers engage in a series of pragmatic strategies to seek information. We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.