Related papers: Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach

Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach

URL: http://arxiv.org/abs/2505.07902v1
Date: Mon, 12 May 2025 09:24:21 GMT
Title: Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach
Authors: Ruikun Hou, Babette Bühler, Tim Fütterer, Efe Bozkir, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci,
Abstract summary: Our study proposes a novel text-centered multimodal fusion architecture to assess the quality of three discourse components grounded in the Global Teaching InSights (GTI) observation protocol.<n>We employ attention mechanisms to capture inter- and intra-modal interactions from transcript, audio, and video streams.<n>Our results highlight the dominant role of text modality in approaching this task.
Score: 7.273857543125784
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classroom discourse is an essential vehicle through which teaching and learning take place. Assessing different characteristics of discursive practices and linking them to student learning achievement enhances the understanding of teaching quality. Traditional assessments rely on manual coding of classroom observation protocols, which is time-consuming and costly. Despite many studies utilizing AI techniques to analyze classroom discourse at the utterance level, investigations into the evaluation of discursive practices throughout an entire lesson segment remain limited. To address this gap, our study proposes a novel text-centered multimodal fusion architecture to assess the quality of three discourse components grounded in the Global Teaching InSights (GTI) observation protocol: Nature of Discourse, Questioning, and Explanations. First, we employ attention mechanisms to capture inter- and intra-modal interactions from transcript, audio, and video streams. Second, a multi-task learning approach is adopted to jointly predict the quality scores of the three components. Third, we formulate the task as an ordinal classification problem to account for rating level order. The effectiveness of these designed elements is demonstrated through an ablation study on the GTI Germany dataset containing 92 videotaped math lessons. Our results highlight the dominant role of text modality in approaching this task. Integrating acoustic features enhances the model's consistency with human ratings, achieving an overall Quadratic Weighted Kappa score of 0.384, comparable to human inter-rater reliability (0.326). Our study lays the groundwork for the future development of automated discourse quality assessment to support teacher professional development through timely feedback on multidimensional discourse practices.

Related papers

AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation [55.607230723223346]
This work presents a systematic study of Large Audio Model (LAM) as a Judge, AudioJudge, investigating whether it can provide a unified evaluation framework that addresses both challenges.<n>We explore AudioJudge across audio characteristic detection tasks, including pronunciation, speaking rate, speaker identification and speech quality, and system-level human preference simulation for automated benchmarking.<n>We introduce a multi-aspect ensemble AudioJudge to enable general-purpose multi-aspect audio evaluation. This method decomposes speech assessment into specialized judges for lexical content, speech quality, and paralinguistic features, achieving up to 0.91 Spearman correlation with human preferences on
arXiv Detail & Related papers (2025-07-17T00:39:18Z)
Automatic Large Language Models Creation of Interactive Learning Lessons [4.3668925518595225]
We develop a system capable of creating structured tutor training lessons.<n>Our study generated lessons in English for three key topics: Encouraging Students' Independence, Encouraging Help-Seeking Behavior, and Turning on Cameras.<n>Results demonstrate that the task decomposition strategy led to higher-rated lessons compared to single-step generation.
arXiv Detail & Related papers (2025-06-20T06:58:50Z)
Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey [64.08485471150486]
This survey examines evaluation methods for large language model (LLM)-based agents in multi-turn conversational settings.<n>We systematically reviewed nearly 250 scholarly sources, capturing the state of the art from various venues of publication.
arXiv Detail & Related papers (2025-03-28T14:08:40Z)
The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education [3.967610895056427]
This paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings.
arXiv Detail & Related papers (2024-04-03T04:15:29Z)
Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT [7.273857543125784]
Our work explores a multimodal approach to automatically estimating encouragement and warmth in classrooms. We employed facial and speech emotion recognition with sentiment analysis to extract interpretable features from video, audio, and transcript data. We demonstrated our approach on the GTI dataset, comprising 367 16-minute video segments from 92 authentic lesson recordings.
arXiv Detail & Related papers (2024-04-01T16:58:09Z)
A Hierarchy-based Analysis Approach for Blended Learning: A Case Study with Chinese Students [12.533646830917213]
This paper proposes a hierarchy-based evaluation approach for blended learning evaluation. The results show that cognitive engagement and emotional engagement play a more important role in blended learning evaluation.
arXiv Detail & Related papers (2023-09-19T00:09:00Z)
Utilizing Natural Language Processing for Automated Assessment of Classroom Discussion [0.7087237546722617]
In this work, we experimented with various modern natural language processing (NLP) techniques to automatically generate rubric scores for individual dimensions of classroom text discussion quality. Despite the limited amount of data, our work shows encouraging results in some of the rubrics while suggesting that there is room for improvement in the others.
arXiv Detail & Related papers (2023-06-21T16:45:24Z)
Multi-Dimensional Evaluation of Text Summarization with In-Context Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning. Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization. We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification. We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages. Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z)
Unsupervised Domain Adaptive Person Re-Identification via Human Learning Imitation [67.52229938775294]
In past years, researchers propose to utilize the teacher-student framework in their methods to decrease the domain gap between different person re-identification datasets. Inspired by recent teacher-student framework based methods, we propose to conduct further exploration to imitate the human learning process from different aspects.
arXiv Detail & Related papers (2021-11-28T01:14:29Z)
Neural Multi-Task Learning for Teacher Question Detection in Online Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings. By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z)
Assessment Modeling: Fundamental Pre-training Tasks for Interactive Educational Systems [3.269851859258154]
A common way of circumventing label-scarce problems is pre-training a model to learn representations of the contents of learning items. We propose Assessment Modeling, a class of fundamental pre-training tasks for general interactive educational systems.
arXiv Detail & Related papers (2020-01-01T02:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.