Simulated Students in Tutoring Dialogues: Substance or Illusion?
- URL: http://arxiv.org/abs/2601.04025v1
- Date: Wed, 07 Jan 2026 15:44:11 GMT
- Title: Simulated Students in Tutoring Dialogues: Substance or Illusion?
- Authors: Alexander Scarlatos, Jaewook Lee, Simon Woodhead, Andrew Lan,
- Abstract summary: This work defines the student simulation task, proposes a set of evaluation metrics that span linguistic, behavioral, and cognitive aspects, and benchmark a wide range of student simulation methods on these metrics.<n>We experiment on a real-world math tutoring dialogue dataset, where both automated and human evaluation results show that prompting strategies for student simulation perform poorly.
- Score: 45.40380629269521
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in large language models (LLMs) enable many new innovations in education. However, evaluating the effectiveness of new technology requires real students, which is time-consuming and hard to scale up. Therefore, many recent works on LLM-powered tutoring solutions have used simulated students for both training and evaluation, often via simple prompting. Surprisingly, little work has been done to ensure or even measure the quality of simulated students. In this work, we formally define the student simulation task, propose a set of evaluation metrics that span linguistic, behavioral, and cognitive aspects, and benchmark a wide range of student simulation methods on these metrics. We experiment on a real-world math tutoring dialogue dataset, where both automated and human evaluation results show that prompting strategies for student simulation perform poorly; supervised fine-tuning and preference optimization yield much better but still limited performance, motivating future work on this challenging task.
Related papers
- D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping [66.22412592525369]
We introduce a real-to-sim-to-real engine that leverages the Gaussian Splat representations to build a differentiable engine.<n>We show that our engine achieves accurate and robust performance in mass identification across various object geometries and mass values.<n>Those optimized mass values facilitate force-aware policy learning, achieving superior and high performance in object grasping.
arXiv Detail & Related papers (2026-03-01T15:32:04Z) - UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models [59.693733170193944]
Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings.<n>Recent reinforcement learning approaches address this limitation but face two critical challenges.<n>We propose the Unidirectional Cognitive Optimization (UCO) method to address these challenges.
arXiv Detail & Related papers (2025-11-12T01:27:02Z) - SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors [58.87134689752605]
We introduce SimBench, the first large-scale, standardized benchmark for a robust, reproducible science of LLM simulation.<n>We show that even the best LLMs today have limited simulation ability (score: 40.80/100), performance scales log-linearly with model size.<n>We demonstrate that simulation ability correlates most strongly with deep, knowledge-intensive reasoning.
arXiv Detail & Related papers (2025-10-20T13:14:38Z) - Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents [36.704574105201864]
Large language models (LLMs) are revolutionizing education, with LLM-based agents playing a key role in simulating student behavior.<n>A major challenge in student simulation is modeling the diverse learning patterns of students at various cognitive levels.
arXiv Detail & Related papers (2025-05-26T13:48:49Z) - MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving [3.2962799070467432]
This paper explores the capabilities of large language models (LLMs) to assess students' math problem-solving processes and provide adaptive feedback.<n>We evaluate the model's ability to support personalized learning in two scenarios: one where the model has access to students' prior answer histories, and another simulating a cold-start context.
arXiv Detail & Related papers (2025-05-23T15:59:39Z) - Can LLMs Simulate Personas with Reversed Performance? A Benchmark for Counterfactual Instruction Following [12.145213376813155]
Large Language Models (LLMs) are increasingly widely used to simulate personas in virtual environments.<n>We show that even state-of-the-art LLMs cannot simulate personas with reversed performance.
arXiv Detail & Related papers (2025-04-08T22:00:32Z) - Exploring LLM-based Student Simulation for Metacognitive Cultivation [33.346260553878984]
We propose a pipeline for automatically generating and filtering high-quality simulated student agents.<n>Our work paves the way for broader applications in personalized learning and educational assessment.
arXiv Detail & Related papers (2025-02-17T11:12:47Z) - Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT.
AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs.
Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z) - MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education [12.364513740761739]
Collaborative problem solving (CPS) is essential in mathematics education, fostering deeper learning through the exchange of ideas.<n>Recent advancements in Large Language Models (LLMs) offer a promising avenue to enhance CPS in mathematics education.<n>We designed and developed MathVC, a multi-persona simulated virtual classroom platform to facilitate CPS in mathematics.
arXiv Detail & Related papers (2024-04-10T03:35:51Z) - Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.
We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function.
Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z) - RLTutor: Reinforcement Learning Based Adaptive Tutoring System by
Modeling Virtual Student with Fewer Interactions [10.34673089426247]
We propose a framework for optimizing teaching strategies by constructing a virtual model of the student.
Our results can serve as a buffer between theoretical instructional optimization and practical applications in e-learning systems.
arXiv Detail & Related papers (2021-07-31T15:42:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.