Related papers: Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale

Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale

URL: http://arxiv.org/abs/2507.17985v2
Date: Mon, 28 Jul 2025 22:35:44 GMT
Title: Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale
Authors: Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun,
Abstract summary: The integration of large language models into educational tools has the potential to substantially impact how teachers plan instruction.<n>This paper presents a human-AI collaborative methodology for large-scale qualitative analysis of over 140,000 educator-AI messages.
Score: 9.092920230987684
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of large language models (LLMs) into educational tools has the potential to substantially impact how teachers plan instruction, support diverse learners, and engage in professional reflection. Yet little is known about how educators actually use these tools in practice and how their interactions with AI can be meaningfully studied at scale. This paper presents a human-AI collaborative methodology for large-scale qualitative analysis of over 140,000 educator-AI messages drawn from a generative AI platform used by K-12 teachers. Through a four-phase coding pipeline, we combined inductive theme discovery, codebook development, structured annotation, and model benchmarking to examine patterns of educator engagement and evaluate the performance of LLMs in qualitative coding tasks. We developed a hierarchical codebook aligned with established teacher evaluation frameworks, capturing educators' instructional goals, contextual needs, and pedagogical strategies. Our findings demonstrate that LLMs, particularly Claude 3.5 Haiku, can reliably support theme identification, extend human recognition in complex scenarios, and outperform open-weight models in both accuracy and structural reliability. The analysis also reveals substantive patterns in how educators inquire AI to enhance instructional practices (79.7 percent of total conversations), create or adapt content (76.1 percent), support assessment and feedback loop (46.9 percent), attend to student needs for tailored instruction (43.3 percent), and assist other professional responsibilities (34.2 percent), highlighting emerging AI-related competencies that have direct implications for teacher preparation and professional development. This study offers a scalable, transparent model for AI-augmented qualitative research and provides foundational insights into the evolving role of generative AI in educational practice.

Related papers

Benchmarking the Pedagogical Knowledge of Large Language Models [4.417539128489408]
This paper introduces The Pedagogy Benchmark, a novel dataset designed to evaluate large language models on their pedagogical knowledge.<n>These benchmarks are built on a carefully curated set of questions sourced from professional development exams for teachers.<n>We report results for 97 models, with accuracies spanning a range from 28% to 89% on the pedagogical knowledge questions.
arXiv Detail & Related papers (2025-06-23T14:49:01Z)
Enhancing AI-Driven Education: Integrating Cognitive Frameworks, Linguistic Feedback Analysis, and Ethical Considerations for Improved Content Generation [0.0]
This paper synthesizes insights from four related studies to propose a comprehensive framework for enhancing AI-driven educational tools.<n>We integrate cognitive assessment frameworks, linguistic analysis of AI-generated feedback, and ethical design principles to guide the development of effective and responsible AI tools.
arXiv Detail & Related papers (2025-05-01T06:36:21Z)
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework [9.76455227840645]
Large language models (LLMs) increasingly serve as educational tools, yet evaluating their teaching capabilities remains challenging.<n>We introduce EducationQ, a multi-agent dialogue framework that efficiently assesses teaching capabilities through simulated dynamic educational scenarios.
arXiv Detail & Related papers (2025-04-21T07:48:20Z)
Connecting Feedback to Choice: Understanding Educator Preferences in GenAI vs. Human-Created Lesson Plans in K-12 Education -- A Comparative Analysis [11.204345070162592]
generative AI (GenAI) models are increasingly explored for educational applications.<n>This study compares lesson plans authored by human curriculum designers, a fine-tuned LLaMA-2-13b model trained on K-12 content, and a customized GPT-4 model.<n>Using a large-scale preference study with K-12 math educators, we examine how preferences vary across grade levels and instructional components.
arXiv Detail & Related papers (2025-04-07T19:28:19Z)
LLM Agents for Education: Advances and Applications [49.3663528354802]
Large Language Model (LLM) agents have demonstrated remarkable capabilities in automating tasks and driving innovation across diverse educational applications.<n>This survey aims to provide a comprehensive technological overview of LLM agents for education, fostering further research and collaboration to enhance their impact for the greater good of learners and educators alike.
arXiv Detail & Related papers (2025-03-14T11:53:44Z)
Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues [46.60683274479208]
We introduce an approach to train large language models (LLMs) to generate tutor utterances that maximize the likelihood of student correctness.<n>We show that tutor utterances generated by our model lead to significantly higher chances of correct student responses.
arXiv Detail & Related papers (2025-03-09T03:38:55Z)
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors [76.1634959528817]
We present MathTutorBench, an open-source benchmark for holistic tutoring model evaluation.<n>MathTutorBench contains datasets and metrics that broadly cover tutor abilities as defined by learning sciences research in dialog-based teaching.<n>We evaluate a wide set of closed- and open-weight models and find that subject expertise, indicated by solving ability, does not immediately translate to good teaching.
arXiv Detail & Related papers (2025-02-26T08:43:47Z)
Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts [13.617709093240231]
We examine how artificial intelligence (AI) and machine learning (ML) methods can analyze educational content, teacher discourse, and student responses to foster instructional improvement. We identify key areas where AI/ML integration offers significant advantages, including teacher coaching, student support, and content development. This paper emphasizes the importance of aligning AI/ML technologies with pedagogical goals to realize their full potential in educational settings.
arXiv Detail & Related papers (2024-03-06T18:29:18Z)
Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z)
Opportunities and Challenges in Neural Dialog Tutoring [54.07241332881601]
We rigorously analyze various generative language models on two dialog tutoring datasets for language learning. We find that although current approaches can model tutoring in constrained learning scenarios, they perform poorly in less constrained scenarios. Our human quality evaluation shows that both models and ground-truth annotations exhibit low performance in terms of equitable tutoring.
arXiv Detail & Related papers (2023-01-24T11:00:17Z)
Modelling Assessment Rubrics through Bayesian Networks: a Pragmatic Approach [40.06500618820166]
This paper presents an approach to deriving a learner model directly from an assessment rubric. We illustrate how the approach can be applied to automatize the human assessment of an activity developed for testing computational thinking skills.
arXiv Detail & Related papers (2022-09-07T10:09:12Z)
Explainable Active Learning (XAL): An Empirical Study of How Local Explanations Impact Annotator Experience [76.9910678786031]
We propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the recently surging field of explainable AI (XAI) into an Active Learning setting. Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
arXiv Detail & Related papers (2020-01-24T22:52:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.