Related papers: Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs

Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs

URL: http://arxiv.org/abs/2508.06583v1
Date: Fri, 08 Aug 2025 01:02:44 GMT
Title: Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs
Authors: Ying Liu, Can Li, Ting Zhang, Mei Wang, Qiannan Zhu, Jian Li, Hua Huang,
Abstract summary: This study shifts focus from mere question generation to the broader instructional guidance capability.<n>We propose GuideEval, a benchmark grounded in authentic educational dialogues.<n> Empirical findings reveal that existing LLMs frequently fail to provide effective adaptive scaffolding.
Score: 34.94756659609455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The conversational capabilities of large language models hold significant promise for enabling scalable and interactive tutoring. While prior research has primarily examined their capacity for Socratic questioning, it often overlooks a critical dimension: adaptively guiding learners based on their cognitive states. This study shifts focus from mere question generation to the broader instructional guidance capability. We ask: Can LLMs emulate expert tutors who dynamically adjust strategies in response to learners' understanding? To investigate this, we propose GuideEval, a benchmark grounded in authentic educational dialogues that evaluates pedagogical guidance through a three-phase behavioral framework: (1) Perception, inferring learner states; (2) Orchestration, adapting instructional strategies; and (3) Elicitation, stimulating proper reflections. Empirical findings reveal that existing LLMs frequently fail to provide effective adaptive scaffolding when learners exhibit confusion or require redirection. Furthermore, we introduce a behavior-guided finetuning strategy that leverages behavior-prompted instructional dialogues, significantly enhancing guidance performance. By shifting the focus from isolated content evaluation to learner-centered interaction, our work advocates a more dialogic paradigm for evaluating Socratic LLMs.

Related papers

Letting Tutor Personas "Speak Up" for LLMs: Learning Steering Vectors from Dialogue via Preference Optimization [45.40380629269521]
We show how tutor personas embedded in human tutor-student dialogues can be used to guide LLM behavior without relying on explicitly prompted instructions.<n>We find that a steering vector captures tutor-specific variation across dialogue contexts, improving semantic alignment with ground-truth tutor utterances and increasing preference-based evaluations.
arXiv Detail & Related papers (2026-02-07T17:44:07Z)
EduDial: Constructing a Large-scale Multi-turn Teacher-Student Dialogue Corpus [59.693733170193944]
We present EduDial, a comprehensive multi-turn teacher-student dialogue dataset.<n>EduDial covers 345 core knowledge points and consists of 34,250 dialogue sessions generated through interactions between teacher and student agents.
arXiv Detail & Related papers (2025-10-14T18:18:43Z)
Exploring Conversational Design Choices in LLMs for Pedagogical Purposes: Socratic and Narrative Approaches for Improving Instructor's Teaching Practice [24.54129847914925]
We evaluate TeaPT, a large language model that supports instructors' professional development through two conversational approaches.<n>A Socratic approach that uses guided questioning to foster reflection, and a Narrative approach that offers elaborated suggestions to extend externalized cognition.<n>Less-experienced, AI-optimistic instructors favored the Socratic version, whereas more-experienced, AI-cautious instructors preferred the Narrative version.
arXiv Detail & Related papers (2025-09-15T16:33:37Z)
\textsc{SimInstruct}: A Responsible Tool for Collecting Scaffolding Dialogues Between Experts and LLM-Simulated Novices [21.67295740032255]
SimInstruct is a scalable, expert-in-the-loop tool for collecting scaffolding dialogues.<n>Using teaching development coaching as an example domain, SimInstruct simulates novice instructors via LLMs.<n>Our results reveal that persona traits, such as extroversion and introversion, meaningfully influence how experts engage.
arXiv Detail & Related papers (2025-08-06T13:16:10Z)
Dialogic Pedagogy for Large Language Models: Aligning Conversational AI with Proven Theories of Learning [1.2691047660244332]
Large Language Models (LLMs) are transforming education by enabling rich conversational learning experiences.<n>This article provides a review of how LLM-based conversational agents are being used in higher education.
arXiv Detail & Related papers (2025-06-24T10:19:09Z)
Improving Student-AI Interaction Through Pedagogical Prompting: An Example in Computer Science Education [1.1517315048749441]
Large language model (LLM) applications have sparked both excitement and concern.<n>Recent studies consistently highlight students' (mis)use of LLMs can hinder learning outcomes.<n>This work aims to teach students how to effectively prompt LLMs to improve their learning.
arXiv Detail & Related papers (2025-06-23T20:39:17Z)
A Practical Guide for Supporting Formative Assessment and Feedback Using Generative AI [0.0]
Large-language models (LLMs) can help students, teachers, and peers understand "where learners are going," "where learners currently are," and "how to move learners forward"<n>This review provides a comprehensive foundation for integrating LLMs into formative assessment in a pedagogically informed manner.
arXiv Detail & Related papers (2025-05-29T12:52:43Z)
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning [76.09281171131941]
Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy.<n>We propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors.
arXiv Detail & Related papers (2025-05-21T15:00:07Z)
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs [49.18567856499736]
We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
arXiv Detail & Related papers (2024-09-24T22:31:39Z)
From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning [66.98861219674039]
Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
arXiv Detail & Related papers (2023-10-24T19:46:04Z)
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning [63.63840740526497]
We investigate how instruction tuning adjusts pre-trained models with a focus on intrinsic changes. The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models. Our findings reveal three significant impacts of instruction tuning.
arXiv Detail & Related papers (2023-09-30T21:16:05Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features. To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives. Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.