Related papers: Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning

Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning

URL: http://arxiv.org/abs/2507.20335v1
Date: Sun, 27 Jul 2025 15:56:29 GMT
Title: Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning
Authors: Siyu Song, Wentao Liu, Ye Lu, Ruohua Zhang, Tao Liu, Jinze Lv, Xinyun Wang, Aimin Zhou, Fei Tan, Bo Jiang, Hao Hao,
Abstract summary: EduAlign is a framework designed to guide large language models (LLMs) toward becoming more effective and responsible educational assistants.<n>In the first stage, we curate a dataset of 8k educational interactions and annotate them-both manually and automatically-along three key educational dimensions: Helpfulness, Personalization, and Creativity.<n>In the second stage, we leverage HPC-RM as a reward signal to fine-tune a pre-trained LLM using Group Relative Policy Optimization (GRPO) on a set of 2k diverse prompts.
Score: 17.558663729465692
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of large language models (LLMs) into education presents unprecedented opportunities for scalable personalized learning. However, standard LLMs often function as generic information providers, lacking alignment with fundamental pedagogical principles such as helpfulness, student-centered personalization, and creativity cultivation. To bridge this gap, we propose EduAlign, a novel framework designed to guide LLMs toward becoming more effective and responsible educational assistants. EduAlign consists of two main stages. In the first stage, we curate a dataset of 8k educational interactions and annotate them-both manually and automatically-along three key educational dimensions: Helpfulness, Personalization, and Creativity (HPC). These annotations are used to train HPC-RM, a multi-dimensional reward model capable of accurately scoring LLM outputs according to these educational principles. We further evaluate the consistency and reliability of this reward model. In the second stage, we leverage HPC-RM as a reward signal to fine-tune a pre-trained LLM using Group Relative Policy Optimization (GRPO) on a set of 2k diverse prompts. We then assess the pre- and post-finetuning models on both educational and general-domain benchmarks across the three HPC dimensions. Experimental results demonstrate that the fine-tuned model exhibits significantly improved alignment with pedagogical helpfulness, personalization, and creativity stimulation. This study presents a scalable and effective approach to aligning LLMs with nuanced and desirable educational traits, paving the way for the development of more engaging, pedagogically aligned AI tutors.

Related papers

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning [76.09281171131941]
Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy.<n>We propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors.
arXiv Detail & Related papers (2025-05-21T15:00:07Z)
Fine-Tuning Large Language Models for Educational Support: Leveraging Gagne's Nine Events of Instruction for Lesson Planning [5.022835754140817]
This study investigates how large language models (LLMs) can enhance teacher preparation by incorporating them with Gagne's Nine Events of Instruction.<n>The research starts with creating a comprehensive dataset based on math curriculum standards and Gagne's instructional events.<n>The second method uses specialized datasets to fine-tune open-source models, enhancing their educational content generation and analysis capabilities.
arXiv Detail & Related papers (2025-03-12T11:22:13Z)
Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues [46.60683274479208]
We introduce an approach to train large language models (LLMs) to generate tutor utterances that maximize the likelihood of student correctness.<n>We show that tutor utterances generated by our model lead to significantly higher chances of correct student responses.
arXiv Detail & Related papers (2025-03-09T03:38:55Z)
Advantage-Guided Distillation for Preference Alignment in Small Language Models [37.1672515839325]
We propose to utilize a well-aligned teacher LLM to guide the alignment process for Small Language Models.<n>Our experimental results show that these two approaches appreciably improve the alignment of SLMs and narrow the performance gap with larger counterparts.
arXiv Detail & Related papers (2025-02-25T07:47:22Z)
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z)
LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutoring System [54.71619734800526]
GenMentor is a multi-agent framework designed to deliver goal-oriented, personalized learning within ITS.<n>It maps learners' goals to required skills using a fine-tuned LLM trained on a custom goal-to-skill dataset.<n>GenMentor tailors learning content with an exploration-drafting-integration mechanism to align with individual learner needs.
arXiv Detail & Related papers (2025-01-27T03:29:44Z)
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost. This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z)
AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails [43.19453208130667]
Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. We create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer.
arXiv Detail & Related papers (2024-02-14T14:53:56Z)
Pedagogical Alignment of Large Language Models [24.427653091950994]
Large Language Models (LLMs) provide immediate answers rather than guiding students through the problem-solving process. This paper investigates Learning from Human Preferences (LHP) algorithms to achieve this alignment objective.
arXiv Detail & Related papers (2024-02-07T16:15:59Z)
Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs [13.262711792955377]
This study explores the effectiveness of Large Language Models (LLMs) for automated essay scoring. We propose an open-source LLM-based AES system, inspired by the dual-process theory. We find that our system not only automates the grading process but also enhances the performance and efficiency of human graders.
arXiv Detail & Related papers (2024-01-12T07:50:10Z)
SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision. We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.