Related papers: Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction

Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction

URL: http://arxiv.org/abs/2501.13125v2
Date: Sun, 16 Mar 2025 06:33:02 GMT
Title: Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction
Authors: Yooseop Lee, Suin Kim, Yohan Jo,
Abstract summary: In designing multiple-choice questions (MCQs) in education, creating plausible distractors is crucial for identifying students' misconceptions and gaps in knowledge.<n>This study presents a pipeline for training a model to generate distractors that are more likely to be selected by students.
Score: 1.9949730506194254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In designing multiple-choice questions (MCQs) in education, creating plausible distractors is crucial for identifying students' misconceptions and gaps in knowledge and accurately assessing their understanding. However, prior studies on distractor generation have not paid sufficient attention to enhancing the difficulty of distractors, resulting in reduced effectiveness of MCQs. This study presents a pipeline for training a model to generate distractors that are more likely to be selected by students. First, we train a pairwise ranker to reason about students' misconceptions and assess the relative plausibility of two distractors. Using this model, we create a dataset of pairwise distractor ranks and then train a distractor generator via Direct Preference Optimization (DPO) to generate more plausible distractors. Experiments on computer science subjects (Python, DB, MLDL) demonstrate that our pairwise ranker effectively identifies students' potential misunderstandings and achieves ranking accuracy comparable to human experts. Furthermore, our distractor generator outperforms several baselines in generating plausible distractors and produces questions with a higher item discrimination index (DI).

Related papers

Multimodal Reinforcement Learning with Agentic Verifier for AI Agents [131.46008226323423]
Argos is a principled multimodal reward agent to train reasoning models for agentic tasks.<n>By leveraging our agentic verifier across both SFT data and RL training, our model achieves state-of-the-art results.
arXiv Detail & Related papers (2025-12-03T04:42:47Z)
Learning to Make MISTAKEs: Modeling Incorrect Student Thinking And Key Errors [58.65143578052761]
This paper presents a new method, MISTAKE, that constructs high-quality synthetic examples of reasoning errors.<n>We evaluate MISTAKE on three educational tasks and find that it results in (1) higher accuracy when simulating incorrect student answers.
arXiv Detail & Related papers (2025-10-13T15:10:38Z)
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory? [13.980638430366625]
Reasoning LLMs are trained to verbalize their reasoning process, yielding strong gains on complex tasks.<n>A key prerequisite is the ability to assess the usefulness and build on another model's partial thinking.<n>This paper investigates the question: can standard solo-reasoning training pipelines deliver desired off-trajectory behaviors?
arXiv Detail & Related papers (2025-10-07T19:42:50Z)
Personalized Distractor Generation via MCTS-Guided Reasoning Reconstruction [33.217474795590576]
Distractors, incorrect but plausible answer choices in multiple-choice questions (MCQs) play a critical role in educational assessment by diagnosing student misconceptions.<n>Recent work has leveraged large language models (LLMs) to generate shared, group-level distractors.<n>We introduce the task of personalized distractor generation, which aims to generate tailored distractors based on individual misconceptions inferred from each student's past question-answering (QA) records.
arXiv Detail & Related papers (2025-08-15T03:20:37Z)
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning [47.764552063499046]
Large language models (LLMs) have demonstrated significant improvements in contextual understanding.<n>However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace.<n>We introduce a two-stage framework called Learning to Focus (LeaF) to mitigate confounding factors.
arXiv Detail & Related papers (2025-06-09T15:16:39Z)
Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns [25.90420385230675]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various educational tasks. Their alignment with human learning patterns, particularly in predicting which incorrect options students are most likely to select in multiple-choice questions (MCQs) remains underexplored.
arXiv Detail & Related papers (2025-02-21T01:43:32Z)
The Imitation Game for Educational AI [23.71250100390303]
We present a novel evaluation framework based on a two-phase Turing-like test. In Phase 1, students provide open-ended responses to questions, revealing natural misconceptions. In Phase 2, both AI and human experts, conditioned on each student's specific mistakes, generate distractors for new related questions.
arXiv Detail & Related papers (2025-02-21T01:14:55Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors [78.53699244846285]
Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. LLMs struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions.
arXiv Detail & Related papers (2024-07-12T10:11:40Z)
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank [44.04217284677347]
We propose a novel method to enhance the quality of generated distractors through overgenerate-and-rank. Our ranking model increases alignment with human-authored distractors, although human-authored ones are still preferred over generated ones.
arXiv Detail & Related papers (2024-04-19T00:25:44Z)
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning [43.83422798569986]
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and reliable form of assessment. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers. We propose a simple, in-context learning-based solution for automated distractor and corresponding feedback message generation.
arXiv Detail & Related papers (2023-08-07T01:03:04Z)
Accelerating exploration and representation learning with offline pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z)
Cognitive Diagnosis with Explicit Student Vector Estimation and Unsupervised Question Matrix Learning [53.79108239032941]
We propose an explicit student vector estimation (ESVE) method to estimate the student vectors of DINA. We also propose an unsupervised method called bidirectional calibration algorithm (HBCA) to label the Q-matrix automatically. The experimental results on two real-world datasets show that ESVE-DINA outperforms the DINA model on accuracy and that the Q-matrix labeled automatically by HBCA can achieve performance comparable to that obtained with the manually labeled Q-matrix.
arXiv Detail & Related papers (2022-03-01T03:53:19Z)
The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning [59.777127897688594]
We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices.
arXiv Detail & Related papers (2022-01-24T13:18:02Z)
BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset [0.0]
We present a new BERT-based method for automatically generating distractors using only a small-scale dataset. Evaluation shows that from a student's perspective, our method generated one or more plausible distractors for more than 50% of the MCQs in our test set.
arXiv Detail & Related papers (2021-08-09T12:15:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.