EduDial: Constructing a Large-scale Multi-turn Teacher-Student Dialogue Corpus
- URL: http://arxiv.org/abs/2510.12899v1
- Date: Tue, 14 Oct 2025 18:18:43 GMT
- Title: EduDial: Constructing a Large-scale Multi-turn Teacher-Student Dialogue Corpus
- Authors: Shouang Wei, Min Zhang, Xin Lin, Bo Jiang, Zhongxiang Dai, Kun Kuang,
- Abstract summary: We present EduDial, a comprehensive multi-turn teacher-student dialogue dataset.<n>EduDial covers 345 core knowledge points and consists of 34,250 dialogue sessions generated through interactions between teacher and student agents.
- Score: 59.693733170193944
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, several multi-turn dialogue benchmarks have been proposed to evaluate the conversational abilities of large language models (LLMs). As LLMs are increasingly recognized as a key technology for advancing intelligent education, owing to their ability to deeply understand instructional contexts and provide personalized guidance, the construction of dedicated teacher-student dialogue benchmarks has become particularly important. To this end, we present EduDial, a comprehensive multi-turn teacher-student dialogue dataset. EduDial covers 345 core knowledge points and consists of 34,250 dialogue sessions generated through interactions between teacher and student agents. Its design is guided by Bloom's taxonomy of educational objectives and incorporates ten questioning strategies, including situational questioning, zone of proximal development (ZPD) questioning, and metacognitive questioning-thus better capturing authentic classroom interactions. Furthermore, we design differentiated teaching strategies for students at different cognitive levels, thereby providing more targeted teaching guidance. Building on EduDial, we further develop EduDial-LLM 32B via training and propose an 11-dimensional evaluation framework that systematically measures the teaching abilities of LLMs, encompassing both overall teaching quality and content quality. Experiments on 17 mainstream LLMs reveal that most models struggle in student-centered teaching scenarios, whereas our EduDial-LLM achieves significant gains, consistently outperforming all baselines across all metrics. The code is available at https://github.com/Mind-Lab-ECNU/EduDial/tree/main.
Related papers
- EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education [11.130206904690745]
We introduce EduEval, a comprehensive hierarchical benchmark for evaluating large language models (LLMs) in Chinese K-12 education.<n>EduEval comprises 24 distinct task types with over 11,000 questions spanning primary to high school levels.<n>We evaluate 14 leading LLMs under both zero-shot and few-shot settings, revealing that while models perform well on factual tasks, they struggle with classroom dialogue classification and exhibit inconsistent results in creative content generation.
arXiv Detail & Related papers (2025-11-29T03:09:50Z) - SID: Benchmarking Guided Instruction Capabilities in STEM Education with a Socratic Interdisciplinary Dialogues Dataset [7.233293220739224]
We introduce SID, the first benchmark designed to evaluate the higher-order guidance capabilities of LLMs.<n>Our contributions include a large-scale dataset of 10,000 dialogue turns across 48 complex STEM projects.<n> Baseline experiments confirm that even state-of-the-art LLMs struggle to execute effective guided dialogues.
arXiv Detail & Related papers (2025-08-06T15:49:26Z) - Benchmarking the Pedagogical Knowledge of Large Language Models [4.417539128489408]
This paper introduces The Pedagogy Benchmark, a novel dataset designed to evaluate large language models on their pedagogical knowledge.<n>These benchmarks are built on a carefully curated set of questions sourced from professional development exams for teachers.<n>We report results for 97 models, with accuracies spanning a range from 28% to 89% on the pedagogical knowledge questions.
arXiv Detail & Related papers (2025-06-23T14:49:01Z) - Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach [7.273857543125784]
Our study proposes a novel text-centered multimodal fusion architecture to assess the quality of three discourse components grounded in the Global Teaching InSights (GTI) observation protocol.<n>We employ attention mechanisms to capture inter- and intra-modal interactions from transcript, audio, and video streams.<n>Our results highlight the dominant role of text modality in approaching this task.
arXiv Detail & Related papers (2025-05-12T09:24:21Z) - Enhanced Classroom Dialogue Sequences Analysis with a Hybrid AI Agent: Merging Expert Rule-Base with Large Language Models [7.439914834067705]
This study develops a comprehensive rule base of dialogue sequences and an Artificial Intelligence (AI) agent.
The agent applies expert knowledge while adapting to the complexities of natural language, enabling accurate and flexible categorisation of classroom dialogue sequences.
arXiv Detail & Related papers (2024-11-13T08:13:41Z) - Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs [49.18567856499736]
We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
arXiv Detail & Related papers (2024-09-24T22:31:39Z) - MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues [58.33076950775072]
MT-Bench-101 is designed to evaluate the fine-grained abilities of Large Language Models (LLMs) in multi-turn dialogues.
We construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks.
We then evaluate 21 popular LLMs based on MT-Bench-101, conducting comprehensive analyses from both ability and task perspectives.
arXiv Detail & Related papers (2024-02-22T18:21:59Z) - Opportunities and Challenges in Neural Dialog Tutoring [54.07241332881601]
We rigorously analyze various generative language models on two dialog tutoring datasets for language learning.
We find that although current approaches can model tutoring in constrained learning scenarios, they perform poorly in less constrained scenarios.
Our human quality evaluation shows that both models and ground-truth annotations exhibit low performance in terms of equitable tutoring.
arXiv Detail & Related papers (2023-01-24T11:00:17Z) - Neural Multi-Task Learning for Teacher Question Detection in Online
Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings.
By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z) - Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue
Representation Learning [50.5572111079898]
Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc.
While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive.
In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks.
arXiv Detail & Related papers (2020-02-27T04:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.