SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction
- URL: http://arxiv.org/abs/2507.05129v1
- Date: Mon, 07 Jul 2025 15:41:38 GMT
- Title: SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction
- Authors: Alexander Scarlatos, Nigel Fernandez, Christopher Ormerod, Susan Lottridge, Andrew Lan,
- Abstract summary: We present SMART (Simulated Students Aligned with IRT), a novel method for aligning simulated students with instructed ability.<n>We show that SMART outperforms other item difficulty prediction methods by leveraging its improved ability alignment.
- Score: 41.25292844733891
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Item (question) difficulties play a crucial role in educational assessments, enabling accurate and efficient assessment of student abilities and personalization to maximize learning outcomes. Traditionally, estimating item difficulties can be costly, requiring real students to respond to items, followed by fitting an item response theory (IRT) model to get item difficulty estimates. This approach cannot be applied to the cold-start setting for previously unseen items either. In this work, we present SMART (Simulated Students Aligned with IRT), a novel method for aligning simulated students with instructed ability, which can then be used in simulations to predict the difficulty of open-ended items. We achieve this alignment using direct preference optimization (DPO), where we form preference pairs based on how likely responses are under a ground-truth IRT model. We perform a simulation by generating thousands of responses, evaluating them with an LLM-based scoring model, and fit the resulting data to an IRT model to obtain item difficulty estimates. Through extensive experiments on a real-world student response dataset, we show that SMART outperforms other item difficulty prediction methods by leveraging its improved ability alignment.
Related papers
- Can LLMs Reliably Simulate Real Students' Abilities in Mathematics and Reading Comprehension? [8.558834738072363]
Large Language Models (LLMs) are increasingly used as proxy students in the development of Intelligent Tutoring Systems (ITSs)<n>We collect a dataset of 489 items from the National Assessment of Educational Progress (NAEP) covering mathematics and reading comprehension in grades 4, 8, and 12.<n>We apply an Item Response Theory (IRT) model to position 11 diverse and state-of-the-art LLMs on the same ability scale as real student populations.
arXiv Detail & Related papers (2025-07-11T00:36:57Z) - Maximally-Informative Retrieval for State Space Model Generation [59.954191072042526]
We introduce Retrieval In-Context Optimization (RICO) to minimize model uncertainty for a particular query at test-time.<n>Unlike traditional retrieval-augmented generation (RAG), which relies on externals for document retrieval, our approach leverages direct feedback from the model.<n>We show that standard top-$k$ retrieval with model gradients can approximate our optimization procedure, and provide connections to the leave-one-out loss.
arXiv Detail & Related papers (2025-06-13T18:08:54Z) - Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents [36.704574105201864]
Large language models (LLMs) are revolutionizing education, with LLM-based agents playing a key role in simulating student behavior.<n>A major challenge in student simulation is modeling the diverse learning patterns of students at various cognitive levels.
arXiv Detail & Related papers (2025-05-26T13:48:49Z) - AdvKT: An Adversarial Multi-Step Training Framework for Knowledge Tracing [64.79967583649407]
Knowledge Tracing (KT) monitors students' knowledge states and simulates their responses to question sequences.<n>Existing KT models typically follow a single-step training paradigm, which leads to significant error accumulation.<n>We propose a novel Adversarial Multi-Step Training Framework for Knowledge Tracing (AdvKT) which focuses on the multi-step KT task.
arXiv Detail & Related papers (2025-04-07T03:31:57Z) - SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models [74.40683913645731]
Zero-shot multi-label recognition (MLR) with Vision-Language Models (VLMs) faces significant challenges without training data, model tuning, or architectural modifications.<n>Our work proposes a novel solution treating VLMs as black boxes, leveraging scores without training data or ground truth.<n>Analysis of these prompt scores reveals VLM biases and AND''/OR' signal ambiguities, notably that maximum scores are surprisingly suboptimal compared to second-highest scores.
arXiv Detail & Related papers (2025-02-24T07:15:05Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - A Psychology-based Unified Dynamic Framework for Curriculum Learning [5.410910735259908]
This paper presents a Psychology-based Unified Dynamic Framework for Curriculum Learning (PUDF)
We quantify the difficulty of training data by applying Item Response Theory (IRT) to responses from Artificial Crowds (AC)
We propose a Dynamic Data Selection via Model Ability Estimation (DDS-MAE) strategy to schedule the appropriate amount of data during model training.
arXiv Detail & Related papers (2024-08-09T20:30:37Z) - Recursive Introspection: Teaching Language Model Agents How to Self-Improve [30.086494067593268]
We develop RISE: Recursive IntroSpEction, an approach for fine-tuning large language models.
Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks.
arXiv Detail & Related papers (2024-07-25T17:35:59Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Amortised Design Optimization for Item Response Theory [5.076871870091048]
In education, Item Response Theory (IRT) is used to infer student abilities and characteristics of test items from student responses.
In response, we propose incorporating amortised experimental design into IRT.
The computational cost is shifted to a precomputing phase by training a Deep Reinforcement Learning (DRL) agent with synthetic data.
arXiv Detail & Related papers (2023-07-19T10:42:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.