Related papers: Generating Adequate Distractors for Multiple-Choice Questions

Generating Adequate Distractors for Multiple-Choice Questions

URL: http://arxiv.org/abs/2010.12658v1
Date: Fri, 23 Oct 2020 20:47:58 GMT
Title: Generating Adequate Distractors for Multiple-Choice Questions
Authors: Cheng Zhang, Yicheng Sun, Hejia Chen, Jie Wang
Abstract summary: Our method is a combination of part-of-speech tagging, named-entity tagging, semantic-role labeling, regular expressions, domain knowledge bases, word embeddings, word edit distance, WordNet, and other algorithms. We show that, via experiments and by human judges, each MCQ has at least one adequate distractor and 84% of evaluations have three adequate distractors.
Score: 7.966913971277812
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a novel approach to automatic generation of adequate distractors for a given question-answer pair (QAP) generated from a given article to form an adequate multiple-choice question (MCQ). Our method is a combination of part-of-speech tagging, named-entity tagging, semantic-role labeling, regular expressions, domain knowledge bases, word embeddings, word edit distance, WordNet, and other algorithms. We use the US SAT (Scholastic Assessment Test) practice reading tests as a dataset to produce QAPs and generate three distractors for each QAP to form an MCQ. We show that, via experiments and evaluations by human judges, each MCQ has at least one adequate distractor and 84\% of MCQs have three adequate distractors.

Related papers

Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering [78.89231943329885]
One of the most widely used tasks to evaluate Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA) In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons.
arXiv Detail & Related papers (2025-03-19T08:45:03Z)
Constructing Cloze Questions Generatively [2.2719421441459406]
We present a generative method for constructing cloze questions from an article using neural networks and WordNet. CQG selects an answer key for a given sentence, segments it into a sequence of instances, generates instance-level distractor candidates (IDCs) using a transformer and sibling synsets. It then removes inappropriate IDCs, ranks the remaining IDCs based on contextual embedding similarities, as well as synset and lexical relatedness, forms distractor candidates by replacing instances with the corresponding top-ranked IDCs, and checks if they are legitimate phrases.
arXiv Detail & Related papers (2024-10-05T18:55:38Z)
LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs [61.57691505683534]
Non-Factoid (NF) Question Answering (QA) is challenging to evaluate due to diverse potential answers and no objective criterion. Large Language Models (LLMs) have been resorted to for NFQA evaluation due to their compelling performance on various NLP tasks. We propose a novel listwise NFQA evaluation approach, that utilizes LLMs to rank candidate answers in a list of reference answers sorted by descending quality.
arXiv Detail & Related papers (2024-09-23T06:42:21Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion. We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z)
Assessing Distractors in Multiple-Choice Tests [10.179963650540056]
We propose metrics for the quality of distractors in multiple-choice reading comprehension tests. Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options.
arXiv Detail & Related papers (2023-11-08T09:37:09Z)
SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation) We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z)
EMBRACE: Evaluation and Modifications for Boosting RACE [0.0]
RACE is a dataset of English texts and corresponding multiple-choice questions (MCQs) RACE was constructed by Chinese teachers of English for human reading comprehension. This article provides a detailed analysis of the test set of RACE for high-school students.
arXiv Detail & Related papers (2023-05-15T08:21:32Z)
Automatic Generation of Multiple-Choice Questions [7.310488568715925]
We present two methods to tackle the challenge of QAP generations. A deep-learning-based end-to-end question generation system based on T5 Transformer with Preprocessing and Postprocessing Pipelines. A sequence-learning-based scheme to generate adequate QAPs via meta-sequence representations of sentences.
arXiv Detail & Related papers (2023-03-25T22:45:54Z)
Tag-Set-Sequence Learning for Generating Question-Answer Pairs [10.48660454637293]
We present a new method called tag-set sequence learning to tackle the problem of generating silly questions for texts. We construct a system called TSS-Learner to learn tag-set sequences from given declarative sentences and the corresponding interrogative sentences. We show that TSS-Learner can indeed generate adequate QAPs for certain texts that transformer-based models do poorly.
arXiv Detail & Related papers (2022-10-20T21:51:00Z)
PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance [96.06505049126345]
We present a new dataset, named PACIFIC. Compared with existing CQA datasets, PACIFIC exhibits three key features: (i) proactivity, (ii) numerical reasoning, and (iii) hybrid context of tables and text. A new task is defined accordingly to study Proactive Conversational Question Answering (PCQA), which combines clarification question generation and CQA. UniPCQA performs multi-task learning over all sub-tasks in PCQA and incorporates a simple ensemble strategy to alleviate the error propagation issue in the multi-task learning by cross-validating top-$k$ sampled Seq2Seq
arXiv Detail & Related papers (2022-10-17T08:06:56Z)
Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts. Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.