CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation
- URL: http://arxiv.org/abs/2508.08386v1
- Date: Mon, 11 Aug 2025 18:13:31 GMT
- Title: CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation
- Authors: Shuzhou Yuan, William LaCroix, Hardik Ghoshal, Ercong Nie, Michael Färber,
- Abstract summary: Large Language Models (LLMs) are increasingly employed as AI tutors due to their scalability and potential for personalized instruction.<n>We introduce CoDAE, a framework that adapts LLMs for educational use through Chain-of-Thought data augmentation.<n>We collect real-world dialogues between students and a ChatGPT-based tutor and enrich them using CoT prompting to promote step-by-step reasoning and pedagogically aligned guidance.
- Score: 8.901227918730562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly employed as AI tutors due to their scalability and potential for personalized instruction. However, off-the-shelf LLMs often underperform in educational settings: they frequently reveal answers too readily, fail to adapt their responses to student uncertainty, and remain vulnerable to emotionally manipulative prompts. To address these challenges, we introduce CoDAE, a framework that adapts LLMs for educational use through Chain-of-Thought (CoT) data augmentation. We collect real-world dialogues between students and a ChatGPT-based tutor and enrich them using CoT prompting to promote step-by-step reasoning and pedagogically aligned guidance. Furthermore, we design targeted dialogue cases to explicitly mitigate three key limitations: over-compliance, low response adaptivity, and threat vulnerability. We fine-tune four open-source LLMs on different variants of the augmented datasets and evaluate them in simulated educational scenarios using both automatic metrics and LLM-as-a-judge assessments. Our results show that models fine-tuned with CoDAE deliver more pedagogically appropriate guidance, better support reasoning processes, and effectively resist premature answer disclosure.
Related papers
- UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models [59.693733170193944]
Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings.<n>Recent reinforcement learning approaches address this limitation but face two critical challenges.<n>We propose the Unidirectional Cognitive Optimization (UCO) method to address these challenges.
arXiv Detail & Related papers (2025-11-12T01:27:02Z) - CLASS-IT: Conversational and Lecture-Aligned Small-Scale Instruction Tuning for BabyLMs [81.79228604962687]
This work investigates whether small-scale LMs can benefit from instruction tuning.<n>We compare conversational and question-answering instruction tuning datasets, applied either in a merged or sequential curriculum.<n>Results show that instruction tuning yields small but consistent gains in fine-tuning scenarios, with sequential curricula outperforming merged data.<n>However, improvements do not consistently transfer to zero-shot tasks, suggesting a trade-off between interaction-focused adaptation and broad linguistic generalization.
arXiv Detail & Related papers (2025-10-29T10:36:39Z) - Rethinking Visual Intelligence: Insights from Video Pretraining [75.32388528274224]
Large language models (LLMs) have demonstrated that large-scale pretraining enables systems to adapt rapidly to new problems.<n>We investigate Video Diffusion Models (VDMs) as a promising direction for bridging the gap.
arXiv Detail & Related papers (2025-10-28T14:12:11Z) - TeachLM: Post-Training LLMs for Education Using Authentic Learning Data [4.600044635815686]
TeachLM is a large language model optimized for teaching using parameter-efficient fine-tuning of state-of-the-art models.<n>We use parameter-efficient fine-tuning to develop an authentic student model that enables the generation of high-fidelity synthetic student-tutor dialogues.<n>Our evaluations demonstrate that fine-tuning on authentic learning data significantly improves conversational and pedagogical performance.
arXiv Detail & Related papers (2025-10-06T17:55:04Z) - Teaching Language Models To Gather Information Proactively [53.85419549904644]
Large language models (LLMs) are increasingly expected to function as collaborative partners.<n>In this work, we introduce a new task paradigm: proactive information gathering.<n>We design a scalable framework that generates partially specified, real-world tasks, masking key information.<n>Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information.
arXiv Detail & Related papers (2025-07-28T23:50:09Z) - TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs [50.820065021136024]
DeepSeek R1 has significantly advanced complex reasoning for large language models (LLMs)<n>Recent methods have attempted to replicate R1's reasoning capabilities in multimodal settings.<n>We propose TACO, a novel reinforcement learning algorithm for visual reasoning.
arXiv Detail & Related papers (2025-05-27T06:30:48Z) - From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning [76.09281171131941]
Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy.<n>We propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors.
arXiv Detail & Related papers (2025-05-21T15:00:07Z) - Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring [0.0]
This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning.<n>We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters.<n>The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels.
arXiv Detail & Related papers (2025-05-13T08:50:57Z) - From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System [49.57258257916805]
Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities.<n>Practical applications often favor smaller, internally managed recommender models due to scalability, interpretability, and data privacy constraints.<n>We propose an active data augmentation framework that synthesizes conversational training data by leveraging black-box LLMs guided by active learning techniques.
arXiv Detail & Related papers (2025-04-21T23:05:47Z) - Can Large Language Models Match Tutoring System Adaptivity? A Benchmarking Study [0.0]
Large Language Models (LLMs) hold promise as dynamic instructional aids.<n>Yet, it remains unclear whether LLMs can replicate the adaptivity of intelligent tutoring systems (ITS)
arXiv Detail & Related papers (2025-04-07T23:57:32Z) - Supervised Fine-Tuning LLMs to Behave as Pedagogical Agents in Programming Education [41.69192181482715]
We present the development of GuideLM, a fine-tuned large language model (LLMs) for programming education.<n>GuideLM has been integrated into the C Compiler (DCC), an educational C compiler that leverages LLMs to generate pedagogically sound error explanations.<n>We conducted an expert analysis of 400 responses per model, comparing their pedagogical effectiveness against base OpenAI models.<n>Results indicate that GuideLM and GuideLM-mini improve pedagogical performance, with an 8% increase in Socratic guidance and a 58% improvement in economy of words compared to GPT-4o.
arXiv Detail & Related papers (2025-02-27T21:23:56Z) - Teaching Models to Improve on Tape [30.330699770714165]
Large Language Models (LLMs) often struggle when prompted to generate content under specific constraints.
Recent works have shown that LLMs can benefit from such "corrective feedback"
We introduce an RL framework for teaching models to use such rewards, by simulating interaction sessions, and rewarding the model according to its ability to satisfy the constraints.
arXiv Detail & Related papers (2024-11-03T08:49:55Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Soft Prompting for Unlearning in Large Language Models [11.504012974208466]
This work focuses on investigating machine unlearning for Large Language Models motivated by data protection regulations.
We propose a framework textbfSoft textbfPrompting for textbfUntextbflearning (SPUL)
We conduct a rigorous evaluation of the proposed method and our results indicate that SPUL can significantly improve the trade-off between utility and forgetting.
arXiv Detail & Related papers (2024-06-17T19:11:40Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.