Assessing Distractors in Multiple-Choice Tests
- URL: http://arxiv.org/abs/2311.04554v1
- Date: Wed, 8 Nov 2023 09:37:09 GMT
- Title: Assessing Distractors in Multiple-Choice Tests
- Authors: Vatsal Raina, Adian Liusie, Mark Gales
- Abstract summary: We propose metrics for the quality of distractors in multiple-choice reading comprehension tests.
Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options.
- Score: 10.179963650540056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiple-choice tests are a common approach for assessing candidates'
comprehension skills. Standard multiple-choice reading comprehension exams
require candidates to select the correct answer option from a discrete set
based on a question in relation to a contextual passage. For appropriate
assessment, the distractor answer options must by definition be incorrect but
plausible and diverse. However, generating good quality distractors satisfying
these criteria is a challenging task for content creators. We propose automated
assessment metrics for the quality of distractors in multiple-choice reading
comprehension tests. Specifically, we define quality in terms of the
incorrectness, plausibility and diversity of the distractor options. We assess
incorrectness using the classification ability of a binary multiple-choice
reading comprehension system. Plausibility is assessed by considering the
distractor confidence - the probability mass associated with the distractor
options for a standard multi-class multiple-choice reading comprehension
system. Diversity is assessed by pairwise comparison of an embedding-based
equivalence metric between the distractors of a question. To further validate
the plausibility metric we compare against candidate distributions over
multiple-choice questions and agreement with a ChatGPT model's interpretation
of distractor plausibility and diversity.
Related papers
- Differentiating Choices via Commonality for Multiple-Choice Question Answering [54.04315943420376]
Multiple-choice question answering can provide valuable clues for choosing the right answer.
Existing models often rank each choice separately, overlooking the context provided by other choices.
We propose a novel model by differentiating choices through identifying and eliminating their commonality, called DCQA.
arXiv Detail & Related papers (2024-08-21T12:05:21Z) - QUDSELECT: Selective Decoding for Questions Under Discussion Parsing [90.92351108691014]
Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences.
We introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria.
Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation.
arXiv Detail & Related papers (2024-08-02T06:46:08Z) - Analyzing Multiple-Choice Reading and Listening Comprehension Tests [0.0]
This work investigates how much of a contextual passage needs to be read in multiple-choice reading based on conversation transcriptions and listening comprehension tests to be able to work out the correct answer.
We find that automated reading comprehension systems can perform significantly better than random with partial or even no access to the context passage.
arXiv Detail & Related papers (2023-07-03T14:55:02Z) - Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with
a Focus on Candidate Response Distribution [38.58190457533888]
We introduce the task of candidate distribution matching, propose several evaluation metrics for the task, and demonstrate that automatic systems trained on RACE++ can be leveraged as baselines for our task.
We further demonstrate that these automatic systems can be used for practical pre-test evaluation tasks such as detecting underperforming distractors.
arXiv Detail & Related papers (2023-06-22T17:13:08Z) - Multi-Label Quantification [78.83284164605473]
Quantification, variously called "labelled prevalence estimation" or "learning to quantify", is the supervised learning task of generating predictors of the relative frequencies of the classes of interest in unsupervised data samples.
We propose methods for inferring estimators of class prevalence values that strive to leverage the dependencies among the classes of interest in order to predict their relative frequencies more accurately.
arXiv Detail & Related papers (2022-11-15T11:29:59Z) - Multiple-Choice Question Generation: Towards an Automated Assessment
Framework [0.0]
transformer-based pretrained language models have demonstrated the ability to produce appropriate questions from a context paragraph.
We focus on a fully automated multiple-choice question generation (MCQG) system where both the question and possible answers must be generated from the context paragraph.
arXiv Detail & Related papers (2022-09-23T19:51:46Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z) - Generating Adequate Distractors for Multiple-Choice Questions [7.966913971277812]
Our method is a combination of part-of-speech tagging, named-entity tagging, semantic-role labeling, regular expressions, domain knowledge bases, word embeddings, word edit distance, WordNet, and other algorithms.
We show that, via experiments and by human judges, each MCQ has at least one adequate distractor and 84% of evaluations have three adequate distractors.
arXiv Detail & Related papers (2020-10-23T20:47:58Z) - MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for
Answer Selection [59.95429407899612]
We propose a novel reinforcement learning based multi-step ranking model, named MS-Ranker.
We explicitly consider the potential correctness of candidates and update the evidence with a gating mechanism.
Our model significantly outperforms existing methods that do not rely on external resources.
arXiv Detail & Related papers (2020-10-10T10:36:58Z) - Uncertainty-aware Score Distribution Learning for Action Quality
Assessment [91.05846506274881]
We propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA)
Specifically, we regard an action as an instance associated with a score distribution, which describes the probability of different evaluated scores.
Under the circumstance where fine-grained score labels are available, we devise a multi-path uncertainty-aware score distributions learning (MUSDL) method to explore the disentangled components of a score.
arXiv Detail & Related papers (2020-06-13T15:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.