Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment
- URL: http://arxiv.org/abs/2507.05528v1
- Date: Mon, 07 Jul 2025 22:56:37 GMT
- Title: Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment
- Authors: Jiahuan Pei, Fanghua Ye, Xin Sun, Wentao Deng, Koen Hindriks, Junxiao Wang,
- Abstract summary: Large language models (LLMs) have advanced virtual educators and learners, bridging NLP with AI4Education.<n>We propose WikiHowAgent, a multi-agent workflow leveraging LLMs to simulate interactive teaching-learning conversations.<n>It integrates teacher and learner agents, an interaction manager, and an evaluator to facilitate procedural learning and assess pedagogic quality.
- Score: 11.527716245790828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have advanced virtual educators and learners, bridging NLP with AI4Education. Existing work often lacks scalability and fails to leverage diverse, large-scale course content, with limited frameworks for assessing pedagogic quality. To this end, we propose WikiHowAgent, a multi-agent workflow leveraging LLMs to simulate interactive teaching-learning conversations. It integrates teacher and learner agents, an interaction manager, and an evaluator to facilitate procedural learning and assess pedagogic quality. We introduce a dataset of 114,296 teacher-learner conversations grounded in 14,287 tutorials across 17 domains and 727 topics. Our evaluation protocol combines computational and rubric-based metrics with human judgment alignment. Results demonstrate the workflow's effectiveness in diverse setups, offering insights into LLM capabilities across domains. Our datasets and implementations are fully open-sourced.
Related papers
- Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale [9.092920230987684]
The integration of large language models into educational tools has the potential to substantially impact how teachers plan instruction.<n>This paper presents a human-AI collaborative methodology for large-scale qualitative analysis of over 140,000 educator-AI messages.
arXiv Detail & Related papers (2025-07-23T23:23:38Z) - Benchmarking the Pedagogical Knowledge of Large Language Models [4.417539128489408]
This paper introduces The Pedagogy Benchmark, a novel dataset designed to evaluate large language models on their pedagogical knowledge.<n>These benchmarks are built on a carefully curated set of questions sourced from professional development exams for teachers.<n>We report results for 97 models, with accuracies spanning a range from 28% to 89% on the pedagogical knowledge questions.
arXiv Detail & Related papers (2025-06-23T14:49:01Z) - LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning [58.98865450345401]
We introduce LecEval, an automated metric grounded in Mayer's Cognitive Theory of Multimedia Learning.<n>LecEval assesses effectiveness using four rubrics: Content Relevance (CR), Expressive Clarity (EC), Logical Structure (LS) and Audience Engagement (AE)<n>We curate a large-scale dataset of over 2,000 slides from more than 50 online course videos, annotated with fine-grained human ratings.
arXiv Detail & Related papers (2025-05-04T12:06:47Z) - EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework [9.76455227840645]
Large language models (LLMs) increasingly serve as educational tools, yet evaluating their teaching capabilities remains challenging.<n>We introduce EducationQ, a multi-agent dialogue framework that efficiently assesses teaching capabilities through simulated dynamic educational scenarios.
arXiv Detail & Related papers (2025-04-21T07:48:20Z) - Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs [49.18567856499736]
We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
arXiv Detail & Related papers (2024-09-24T22:31:39Z) - Simulating Classroom Education with LLM-Empowered Agents [48.26286735827104]
Large language models (LLMs) have been applied across various intelligent educational tasks to assist teaching.<n>We propose SimClass, a multi-agent classroom simulation teaching framework.<n>We recognize representative class roles and introduce a novel class control mechanism for automatic classroom teaching.
arXiv Detail & Related papers (2024-06-27T14:51:07Z) - Tutorly: Turning Programming Videos Into Apprenticeship Learning Environments with LLMs [1.6961276655027102]
Our work transforms programming videos into one-on-one tutoring experiences using the cognitive apprenticeship framework.
Tutorly, developed as a JupyterLab, allows learners to set personalized learning goals.
arXiv Detail & Related papers (2024-05-21T17:17:34Z) - Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development.
We introduce Experiential Co-Learning, a novel LLM-agent learning framework.
Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.