Quiz Design Task: Helping Teachers Create Quizzes with Automated
Question Generation
- URL: http://arxiv.org/abs/2205.01730v1
- Date: Tue, 3 May 2022 18:59:03 GMT
- Title: Quiz Design Task: Helping Teachers Create Quizzes with Automated
Question Generation
- Authors: Philippe Laban and Chien-Sheng Wu and Lidiya Murakhovs'ka and Wenhao
Liu and Caiming Xiong
- Abstract summary: This paper focuses on the use case of helping teachers automate the generation of reading comprehension quizzes.
In our study, teachers building a quiz receive question suggestions, which they can either accept or refuse with a reason.
- Score: 87.34509878569916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question generation (QGen) models are often evaluated with standardized NLG
metrics that are based on n-gram overlap. In this paper, we measure whether
these metric improvements translate to gains in a practical setting, focusing
on the use case of helping teachers automate the generation of reading
comprehension quizzes. In our study, teachers building a quiz receive question
suggestions, which they can either accept or refuse with a reason. Even though
we find that recent progress in QGen leads to a significant increase in
question acceptance rates, there is still large room for improvement, with the
best model having only 68.4% of its questions accepted by the ten teachers who
participated in our study. We then leverage the annotations we collected to
analyze standard NLG metrics and find that model performance has reached
projected upper-bounds, suggesting new automatic metrics are needed to guide
QGen research forward.
Related papers
- How Teachers Can Use Large Language Models and Bloom's Taxonomy to
Create Educational Quizzes [5.487297537295827]
This paper applies a large language model-based QG approach where questions are generated with learning goals derived from Bloom's taxonomy.
The results demonstrate that teachers prefer to write quizzes with automatically generated questions, and that such quizzes have no loss in quality compared to handwritten versions.
arXiv Detail & Related papers (2024-01-11T13:47:13Z) - Automatic Answerability Evaluation for Question Generation [32.1067137848404]
This work proposes PMAN, a novel automatic evaluation metric to assess whether the generated questions are answerable by the reference answers.
Our implementation of a GPT-based QG model achieves state-of-the-art performance in generating answerable questions.
arXiv Detail & Related papers (2023-09-22T00:13:07Z) - Distractor generation for multiple-choice questions with predictive
prompting and large language models [21.233186754403093]
Large Language Models (LLMs) such as ChatGPT have demonstrated remarkable performance across various tasks.
We propose a strategy for guiding LLMs in generating relevant distractors by prompting them with question items automatically retrieved from a question bank.
We found that on average 53% of the generated distractors presented to the teachers were rated as high-quality, i.e., suitable for immediate use as is.
arXiv Detail & Related papers (2023-07-30T23:15:28Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Learning Answer Generation using Supervision from Automatic Question
Answering Evaluators [98.9267570170737]
We propose a novel training paradigm for GenQA using supervision from automatic QA evaluation models (GAVA)
We evaluate our proposed methods on two academic and one industrial dataset, obtaining a significant improvement in answering accuracy over the previous state of the art.
arXiv Detail & Related papers (2023-05-24T16:57:04Z) - Hurdles to Progress in Long-form Question Answering [34.805039943215284]
We show that the task formulation raises fundamental challenges regarding evaluation and dataset creation.
We first design a new system that relies on sparse attention and contrastive retriever learning to achieve state-of-the-art performance.
arXiv Detail & Related papers (2021-03-10T20:32:30Z) - Exploring Question-Specific Rewards for Generating Deep Questions [42.243227323241584]
We design three different rewards that target to improve the fluency, relevance, and answerability of generated questions.
We find that optimizing question-specific rewards generally leads to better performance in automatic evaluation metrics.
arXiv Detail & Related papers (2020-11-02T16:37:30Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - KPQA: A Metric for Generative Question Answering Using Keyphrase Weights [64.54593491919248]
KPQA-metric is a new metric for evaluating correctness of generative question answering systems.
Our new metric assigns different weights to each token via keyphrase prediction.
We show that our proposed metric has a significantly higher correlation with human judgments than existing metrics.
arXiv Detail & Related papers (2020-05-01T03:24:36Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.