AI tutoring can safely and effectively support students: An exploratory RCT in UK classrooms
- URL: http://arxiv.org/abs/2512.23633v1
- Date: Mon, 29 Dec 2025 17:44:03 GMT
- Title: AI tutoring can safely and effectively support students: An exploratory RCT in UK classrooms
- Authors: LearnLM Team, Eedi, :, Albert Wang, Aliya Rysbek, Andrea Huber, Anjali Nambiar, Anna Kenolty, Ben Caulfield, Beth Lilley-Draper, Bibi Groot, Brian Veprek, Chelsea Burdett, Claire Willis, Craig Barton, Digory Smith, George Mu, Harriet Walters, Irina Jurenka, Iris Hulls, James Stalley-Moores, Jonathan Caton, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Liam McCafferty, Lucy Dalton, Markus Kunesch, Pauline Malubay, Rachel Kidson, Rich Wells, Sam Wheeler, Sara Wiltberger, Shakir Mohamed, Simon Woodhead, Vasco Brazão,
- Abstract summary: One-to-one tutoring is widely considered the gold standard for personalized education, yet it remains prohibitively expensive to scale.<n>We conducted an exploratory randomized controlled trial (RCT) with $N = 165$ students across five UK secondary schools.<n>We integrated LearnLM -- a generative AI model fine-tuned for pedagogy -- into chat-based tutoring sessions on the Eedi mathematics platform.
- Score: 3.1642777065752052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One-to-one tutoring is widely considered the gold standard for personalized education, yet it remains prohibitively expensive to scale. To evaluate whether generative AI might help expand access to this resource, we conducted an exploratory randomized controlled trial (RCT) with $N = 165$ students across five UK secondary schools. We integrated LearnLM -- a generative AI model fine-tuned for pedagogy -- into chat-based tutoring sessions on the Eedi mathematics platform. In the RCT, expert tutors directly supervised LearnLM, with the remit to revise each message it drafted until they would be satisfied sending it themselves. LearnLM proved to be a reliable source of pedagogical instruction, with supervising tutors approving 76.4% of its drafted messages making zero or minimal edits (i.e., changing only one or two characters). This translated into effective tutoring support: students guided by LearnLM performed at least as well as students chatting with human tutors on each learning outcome we measured. In fact, students who received support from LearnLM were 5.5 percentage points more likely to solve novel problems on subsequent topics (with a success rate of 66.2%) than those who received tutoring from human tutors alone (rate of 60.7%). In interviews, tutors highlighted LearnLM's strength at drafting Socratic questions that encouraged deeper reflection from students, with multiple tutors even reporting that they learned new pedagogical practices from the model. Overall, our results suggest that pedagogically fine-tuned AI tutoring systems may play a promising role in delivering effective, individualized learning support at scale.
Related papers
- PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors [66.56586559631516]
Large language models (LLMs) have potential as educational tutors.<n>But different tutoring strategies benefit different student personalities.<n>Despite this, current LLM tutoring systems do not take into account student personality traits.
arXiv Detail & Related papers (2026-01-13T10:17:26Z) - Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study [3.976073625291173]
We analyze 50 randomly selected transcripts of college-student remote tutors assisting middle school students in mathematics.<n>Using GPT-4, GPT-4o, GPT-4-turbo, Gemini-1.5-pro, and LearnLM, we assess tutors' application of two tutor skills: delivering effective praise and responding to student math errors.
arXiv Detail & Related papers (2025-06-20T18:13:33Z) - From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning [82.50157695987558]
Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy.<n>We propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors.
arXiv Detail & Related papers (2025-05-21T15:00:07Z) - Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues [46.60683274479208]
We introduce an approach to train large language models (LLMs) to generate tutor utterances that maximize the likelihood of student correctness.<n>We show that tutor utterances generated by our model lead to significantly higher chances of correct student responses.
arXiv Detail & Related papers (2025-03-09T03:38:55Z) - Beyond Final Answers: Evaluating Large Language Models for Math Tutoring [0.24197860834245388]
We present two approaches to evaluate the correctness and quality of Large Language Models (LLMs) in math tutoring contexts.<n>The first approach uses an intelligent tutoring system for college algebra as a testbed to assess LLM problem-solving capabilities.<n>The second approach evaluates LLM as tutors rather than problem solvers.
arXiv Detail & Related papers (2025-02-23T15:43:45Z) - Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students [53.20318273452059]
Large language models (LLMs) like OpenAI's ChatGPT have opened up new avenues in education.<n>Despite school restrictions, our survey of over 300 middle and high school students revealed that a remarkable 70% of students have utilized LLMs.<n>We propose a few ideas to address such issues, including subject-specific models, personalized learning, and AI classrooms.
arXiv Detail & Related papers (2024-11-27T19:19:34Z) - Towards the Pedagogical Steering of Large Language Models for Tutoring: A Case Study with Modeling Productive Failure [36.83786872708736]
One-to-one tutoring is one of the most efficient methods of teaching.<n>We develop StratL, an algorithm to optimize LLM prompts and steer it to follow a predefined multi-turn tutoring plan represented as a transition graph.<n>As a case study, we create a prototype tutor for high school math following Productive Failure (PF), an advanced and effective learning design.
arXiv Detail & Related papers (2024-10-03T16:15:41Z) - Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs [49.18567856499736]
We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
arXiv Detail & Related papers (2024-09-24T22:31:39Z) - Integrating AI Tutors in a Programming Course [0.0]
RAGMan is an LLM-powered tutoring system that can support a variety of course-specific and homework-specific AI tutors.
This paper describes the interactions the students had with the AI tutors, the students' feedback, and a comparative grade analysis.
arXiv Detail & Related papers (2024-07-14T00:42:39Z) - MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties
Grounded in Math Reasoning Problems [74.73881579517055]
We propose a framework to generate such dialogues by pairing human teachers with a Large Language Model prompted to represent common student errors.
We describe how we use this framework to collect MathDial, a dataset of 3k one-to-one teacher-student tutoring dialogues.
arXiv Detail & Related papers (2023-05-23T21:44:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.