Difficulty-Controllable Cloze Question Distractor Generation
- URL: http://arxiv.org/abs/2511.01526v1
- Date: Mon, 03 Nov 2025 12:42:25 GMT
- Title: Difficulty-Controllable Cloze Question Distractor Generation
- Authors: Seokhoon Kang, Yejin Jeon, Seonjeong Hwang, Gary Geunbae Lee,
- Abstract summary: Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension.<n>We propose a novel framework for generating distractors with controllable difficulty by leveraging both data augmentation and a multitask learning strategy.<n>We show that our method generates high-quality distractors across difficulty levels and substantially outperforms GPT-4o in aligning distractor difficulty with human perception.
- Score: 20.062590379176218
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension. However, generating high-quality distractors remains challenging, as existing methods often lack adaptability and control over difficulty levels, and the absence of difficulty-annotated datasets further hinders progress. To address these issues, we propose a novel framework for generating distractors with controllable difficulty by leveraging both data augmentation and a multitask learning strategy. First, to create a high-quality, difficulty-annotated dataset, we introduce a two-way distractor generation process in order to produce diverse and plausible distractors. These candidates are subsequently refined through filtering and then categorized by difficulty using an ensemble QA system. Second, this newly created dataset is leveraged to train a difficulty-controllable generation model via multitask learning. The framework includes carefully designed auxiliary tasks that enhance the model's semantic understanding of distractors and its ability to estimate their difficulty. Experimental results demonstrate that our method generates high-quality distractors across difficulty levels and substantially outperforms GPT-4o in aligning distractor difficulty with human perception.
Related papers
- Learning with Challenges: Adaptive Difficulty-Aware Data Generation for Mobile GUI Agent Training [10.376682582953046]
MobileGen is a novel data generation framework that aligns training difficulty with the GUI agent's capability frontier.<n>It consistently outperforms existing data generation methods by improving the average performance of GUI agents by 1.57 times.<n>This highlights the importance of capability-aligned data generation for effective mobile GUI agent training.
arXiv Detail & Related papers (2026-01-30T10:03:20Z) - Four Quadrants of Difficulty: A Simple Categorisation and its Limits [4.304007567113229]
We propose a four-quadrant categorisation of difficulty signals -- human vs. model and task-agnostic vs. task-dependent.<n>We find that task-agnostic features behave largely independently and that only task-dependent features align.<n>These findings challenge common Curriculum Learning intuitions and highlight the need for lightweight, task-dependent difficulty estimators.
arXiv Detail & Related papers (2026-01-04T11:31:51Z) - Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models [54.29243291958429]
We develop a problem generator that reasons explicitly to plan problem directions before synthesis.<n>We treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty.<n>Our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models.
arXiv Detail & Related papers (2025-11-13T03:08:51Z) - ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning [51.946959481392064]
Large Reasoning Models (LRMs) have shown impressive capabilities in complex problem-solving.<n>We propose ScaleDiff, a pipeline designed to scale the creation of difficult problems.<n>We show that our pipeline can effectively transfer advanced reasoning capabilities without relying on larger, more expensive teacher models.
arXiv Detail & Related papers (2025-09-25T12:22:44Z) - Can Language Models Follow Multiple Turns of Entangled Instructions? [109.4355301539557]
Real-world scenarios often require consistency across multiple instructions over time.<n>This work presents a systematic investigation of large language models' capabilities in handling multiple turns of instructions.<n>We construct MultiTurnInstructwith $sim$1.1K high-quality multi-turn conversations through the human-in-the-loop approach.
arXiv Detail & Related papers (2025-03-17T14:31:37Z) - DAST: Difficulty-Aware Self-Training on Large Language Models [68.30467836807362]
Large Language Models (LLM) self-training methods always under-sample on challenging queries.<n>This work proposes a difficulty-aware self-training framework that focuses on improving the quantity and quality of self-generated responses.
arXiv Detail & Related papers (2025-03-12T03:36:45Z) - Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks? [74.88417042125985]
We investigate various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity.<n>We find that even when the outcome error rate for hard task supervision is high, training on such data can outperform perfectly correct supervision of easier subtasks.<n>Our results also reveal that supplementing hard task supervision with the corresponding subtask supervision can yield notable performance improvements.
arXiv Detail & Related papers (2024-10-27T17:55:27Z) - Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization [126.27645170941268]
We present Easy2Hard-Bench, a collection of 6 benchmark datasets spanning various domains.<n>Each problem within these datasets is annotated with numerical difficulty scores.<n>We provide a comprehensive analysis of their performance and generalization capabilities across varying levels of difficulty.
arXiv Detail & Related papers (2024-09-27T03:49:56Z) - Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate
Models for IRT Assessment [0.6138671548064356]
We propose training pre-trained language models (PLMs) as surrogate models to enable item response theory (IRT) assessment.
We also propose two strategies to control the difficulty levels of both the gaps and the distractors using ranking rules to reduce invalid distractors.
arXiv Detail & Related papers (2024-03-03T09:18:05Z) - Automated Distractor and Feedback Generation for Math Multiple-choice
Questions via In-context Learning [43.83422798569986]
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and reliable form of assessment.
To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers.
We propose a simple, in-context learning-based solution for automated distractor and corresponding feedback message generation.
arXiv Detail & Related papers (2023-08-07T01:03:04Z) - Let the Model Decide its Curriculum for Multitask Learning [22.043291547405545]
We propose two classes of techniques to arrange training instances into a learning curriculum based on difficulty scores computed via model-based approaches.
We show that instance-level and dataset-level techniques result in strong representations as they lead to an average performance improvement of 4.17% and 3.15% over their respective baselines.
arXiv Detail & Related papers (2022-05-19T23:34:22Z) - Guiding the Growth: Difficulty-Controllable Question Generation through
Step-by-Step Rewriting [30.722526598633912]
We argue that Question Generation (QG) systems should have stronger control over the logic of generated questions.
We propose a novel framework that progressively increases question difficulty through step-by-step rewriting.
arXiv Detail & Related papers (2021-05-25T06:43:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.