Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models
- URL: http://arxiv.org/abs/2404.02124v3
- Date: Thu, 18 Apr 2024 17:12:19 GMT
- Title: Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models
- Authors: Wanyong Feng, Jaewook Lee, Hunter McNichols, Alexander Scarlatos, Digory Smith, Simon Woodhead, Nancy Otero Ornelas, Andrew Lan,
- Abstract summary: Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and reliable format in assessments and practices.
One of the most important aspects of MCQs is the distractors, i.e., incorrect options that are designed to target common errors or misconceptions among real students.
To date, the task of crafting high-quality distractors largely remains a labor and time-intensive process for teachers and learning content designers, which has limited scalability.
- Score: 40.50115385623107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in assessments and practices. One of the most important aspects of MCQs is the distractors, i.e., incorrect options that are designed to target common errors or misconceptions among real students. To date, the task of crafting high-quality distractors largely remains a labor and time-intensive process for teachers and learning content designers, which has limited scalability. In this work, we study the task of automated distractor generation in the domain of math MCQs and explore a wide variety of large language model (LLM)-based approaches, from in-context learning to fine-tuning. We conduct extensive experiments using a real-world math MCQ dataset and find that although LLMs can generate some mathematically valid distractors, they are less adept at anticipating common errors or misconceptions among real students.
Related papers
- ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection [60.297079601066784]
We introduce ErrorRadar, the first benchmark designed to assess MLLMs' capabilities in error detection.
ErrorRadar evaluates two sub-tasks: error step identification and error categorization.
It consists of 2,500 high-quality multimodal K-12 mathematical problems, collected from real-world student interactions.
Results indicate significant challenges still remain, as GPT-4o with best performance is still around 10% behind human evaluation.
arXiv Detail & Related papers (2024-10-06T14:59:09Z) - MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions [42.148511874019256]
We introduce DiVERT, a novel variational approach that learns an interpretable representation of errors behind distractors in math multiple-choice questions (MCQs)
We show that DiVERT, despite using a base open-source LLM with 7B parameters, outperforms state-of-the-art approaches using GPT-4o on downstream distractor generation.
We also conduct a human evaluation with math educators and find that DiVERT leads to error labels that are of comparable quality to human-authored ones.
arXiv Detail & Related papers (2024-06-27T17:37:31Z) - Math Multiple Choice Question Generation via Human-Large Language Model Collaboration [5.081508251092439]
Multiple choice questions (MCQs) are a popular method for evaluating students' knowledge.
Recent advances in large language models (LLMs) have sparked interest in automating MCQ creation.
This paper introduces a prototype tool designed to facilitate collaboration between LLMs and educators.
arXiv Detail & Related papers (2024-05-01T20:53:13Z) - Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank [44.04217284677347]
We propose a novel method to enhance the quality of generated distractors through overgenerate-and-rank.
Our ranking model increases alignment with human-authored distractors, although human-authored ones are still preferred over generated ones.
arXiv Detail & Related papers (2024-04-19T00:25:44Z) - Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges [60.62904929065257]
Large language models (LLMs) offer possibility for resolving this issue by comprehending individual requests.
This paper reviews the recently emerged LLM research related to educational capabilities, including mathematics, writing, programming, reasoning, and knowledge-based question answering.
arXiv Detail & Related papers (2023-12-27T14:37:32Z) - Automated Distractor and Feedback Generation for Math Multiple-choice
Questions via In-context Learning [43.83422798569986]
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and reliable form of assessment.
To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers.
We propose a simple, in-context learning-based solution for automated distractor and corresponding feedback message generation.
arXiv Detail & Related papers (2023-08-07T01:03:04Z) - Learning to Reuse Distractors to support Multiple Choice Question
Generation in Education [19.408786425460498]
This paper studies how a large existing set of manually created answers and distractors can be leveraged to help teachers in creating new multiple choice questions (MCQs)
We built several data-driven models based on context-aware question and distractor representations, and compared them with static feature-based models.
Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach.
arXiv Detail & Related papers (2022-10-25T12:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.