Related papers: From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

URL: http://arxiv.org/abs/2505.15607v1
Date: Wed, 21 May 2025 15:00:07 GMT
Title: From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
Authors: David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan,
Abstract summary: Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy.<n>We propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors.
Score: 76.09281171131941
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy which requires strategically withholding answers. To mitigate this, we propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors using simulated student-tutor interactions by emphasizing pedagogical quality and guided problem-solving over simply giving away answers. We use our method to train a 7B parameter tutor model without human annotations which reaches similar performance to larger proprietary models like LearnLM. We introduce a controllable reward weighting to balance pedagogical support and student solving accuracy, allowing us to trace the Pareto frontier between these two objectives. Our models better preserve reasoning capabilities than single-turn SFT baselines and can optionally enhance interpretability through thinking tags that expose the model's instructional planning.

Related papers

PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors [66.56586559631516]
Large language models (LLMs) have potential as educational tutors.<n>But different tutoring strategies benefit different student personalities.<n>Despite this, current LLM tutoring systems do not take into account student personality traits.
arXiv Detail & Related papers (2026-01-13T10:17:26Z)
TeachLM: Post-Training LLMs for Education Using Authentic Learning Data [4.600044635815686]
TeachLM is a large language model optimized for teaching using parameter-efficient fine-tuning of state-of-the-art models.<n>We use parameter-efficient fine-tuning to develop an authentic student model that enables the generation of high-fidelity synthetic student-tutor dialogues.<n>Our evaluations demonstrate that fine-tuning on authentic learning data significantly improves conversational and pedagogical performance.
arXiv Detail & Related papers (2025-10-06T17:55:04Z)
CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation [8.901227918730562]
Large Language Models (LLMs) are increasingly employed as AI tutors due to their scalability and potential for personalized instruction.<n>We introduce CoDAE, a framework that adapts LLMs for educational use through Chain-of-Thought data augmentation.<n>We collect real-world dialogues between students and a ChatGPT-based tutor and enrich them using CoT prompting to promote step-by-step reasoning and pedagogically aligned guidance.
arXiv Detail & Related papers (2025-08-11T18:13:31Z)
Partnering with AI: A Pedagogical Feedback System for LLM Integration into Programming Education [19.441958600393342]
This paper introduces a novel framework for large language models (LLMs)-driven feedback generation.<n>Our findings suggest that teachers consider that, when aligned with the framework, LLMs can effectively support students.<n>However, we found several limitations, such as its inability to adapt feedback to dynamic classroom contexts.
arXiv Detail & Related papers (2025-07-01T03:48:48Z)
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL [62.984693936073974]
Large language models (LLMs) excel in tasks like question answering and dialogue.<n>Complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning.<n>We propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents.
arXiv Detail & Related papers (2025-05-23T16:51:54Z)
Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues [46.60683274479208]
We introduce an approach to train large language models (LLMs) to generate tutor utterances that maximize the likelihood of student correctness.<n>We show that tutor utterances generated by our model lead to significantly higher chances of correct student responses.
arXiv Detail & Related papers (2025-03-09T03:38:55Z)
The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples.<n>We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts.<n>Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z)
LLM-based Cognitive Models of Students with Misconceptions [55.29525439159345]
This paper investigates whether Large Language Models (LLMs) can be instruction-tuned to meet this dual requirement. We introduce MalAlgoPy, a novel Python library that generates datasets reflecting authentic student solution patterns. Our insights enhance our understanding of AI-based student models and pave the way for effective adaptive learning systems.
arXiv Detail & Related papers (2024-10-16T06:51:09Z)
Towards the Pedagogical Steering of Large Language Models for Tutoring: A Case Study with Modeling Productive Failure [36.83786872708736]
One-to-one tutoring is one of the most efficient methods of teaching.<n>We develop StratL, an algorithm to optimize LLM prompts and steer it to follow a predefined multi-turn tutoring plan represented as a transition graph.<n>As a case study, we create a prototype tutor for high school math following Productive Failure (PF), an advanced and effective learning design.
arXiv Detail & Related papers (2024-10-03T16:15:41Z)
LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement [93.38736019287224]
"LLMs-as-Instructors" framework autonomously enhances the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model. Within this framework, we implement two strategies: "Learning from Error," which focuses solely on incorrect responses to tailor training data, and "Learning from Error by Contrast", which uses contrastive learning to analyze both correct and incorrect responses for a deeper understanding of errors.
arXiv Detail & Related papers (2024-06-29T17:16:04Z)
AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails [43.19453208130667]
Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. We create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer.
arXiv Detail & Related papers (2024-02-14T14:53:56Z)
Pedagogical Alignment of Large Language Models [24.427653091950994]
Large Language Models (LLMs) provide immediate answers rather than guiding students through the problem-solving process. This paper investigates Learning from Human Preferences (LHP) algorithms to achieve this alignment objective.
arXiv Detail & Related papers (2024-02-07T16:15:59Z)
RLTutor: Reinforcement Learning Based Adaptive Tutoring System by Modeling Virtual Student with Fewer Interactions [10.34673089426247]
We propose a framework for optimizing teaching strategies by constructing a virtual model of the student. Our results can serve as a buffer between theoretical instructional optimization and practical applications in e-learning systems.
arXiv Detail & Related papers (2021-07-31T15:42:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.