Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with
a Focus on Candidate Response Distribution
- URL: http://arxiv.org/abs/2306.13047v4
- Date: Sun, 15 Oct 2023 20:58:26 GMT
- Title: Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with
a Focus on Candidate Response Distribution
- Authors: Adian Liusie, Vatsal Raina, Andrew Mullooly, Kate Knill, Mark J. F.
Gales
- Abstract summary: We introduce the task of candidate distribution matching, propose several evaluation metrics for the task, and demonstrate that automatic systems trained on RACE++ can be leveraged as baselines for our task.
We further demonstrate that these automatic systems can be used for practical pre-test evaluation tasks such as detecting underperforming distractors.
- Score: 38.58190457533888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiple choice exams are widely used to assess candidates across a diverse
range of domains and tasks. To moderate question quality, newly proposed
questions often pass through pre-test evaluation stages before being deployed
into real-world exams. Currently, this evaluation process is manually
intensive, which can lead to time lags in the question development cycle.
Streamlining this process via automation can significantly enhance efficiency,
however, there's a current lack of datasets with adequate pre-test analysis
information. In this paper we analyse a subset of the public Cambridge
Multiple-Choice Questions Reading Database released by Cambridge University
Press & Assessment; a multiple-choice comprehension dataset of questions at
different target levels, with corresponding candidate selection distributions.
We introduce the task of candidate distribution matching, propose several
evaluation metrics for the task, and demonstrate that automatic systems trained
on RACE++ can be leveraged as baselines for our task. We further demonstrate
that these automatic systems can be used for practical pre-test evaluation
tasks such as detecting underperforming distractors, where our detection
systems can automatically identify poor distractors that few candidates select.
Related papers
- On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions.
We propose a novel method to address this challenge.
We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation [9.390902237835457]
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG)
Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions.
arXiv Detail & Related papers (2024-05-22T13:14:11Z) - Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation [20.14906249952034]
The distractor generation task focuses on generating incorrect but plausible options for objective questions.
The evolution of artificial intelligence (AI) has transitioned the task from traditional methods to the use of neural networks and pre-trained language models.
This survey explores distractor generation tasks, datasets, methods, and current evaluation metrics for English objective questions.
arXiv Detail & Related papers (2024-02-02T15:53:31Z) - Assessing Distractors in Multiple-Choice Tests [10.179963650540056]
We propose metrics for the quality of distractors in multiple-choice reading comprehension tests.
Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options.
arXiv Detail & Related papers (2023-11-08T09:37:09Z) - Reinforcement Learning Guided Multi-Objective Exam Paper Generation [21.945655389912112]
We propose a reinforcement learning guided Multi-Objective Exam Paper Generation framework, termed MOEPG.
It simultaneously optimize three exam domain-specific objectives including difficulty degree, distribution of exam scores, and skill coverage.
We show that MOEPG is feasible in addressing the multiple dilemmas of exam paper generation scenario.
arXiv Detail & Related papers (2023-03-02T07:55:52Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - Multiple-Choice Question Generation: Towards an Automated Assessment
Framework [0.0]
transformer-based pretrained language models have demonstrated the ability to produce appropriate questions from a context paragraph.
We focus on a fully automated multiple-choice question generation (MCQG) system where both the question and possible answers must be generated from the context paragraph.
arXiv Detail & Related papers (2022-09-23T19:51:46Z) - Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round.
Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z) - MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for
Answer Selection [59.95429407899612]
We propose a novel reinforcement learning based multi-step ranking model, named MS-Ranker.
We explicitly consider the potential correctness of candidates and update the evidence with a gating mechanism.
Our model significantly outperforms existing methods that do not rely on external resources.
arXiv Detail & Related papers (2020-10-10T10:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.