Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback
- URL: http://arxiv.org/abs/2310.01132v4
- Date: Tue, 16 Apr 2024 21:30:14 GMT
- Title: Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback
- Authors: Jacob Whitehill, Jennifer LoCasale-Crouch,
- Abstract summary: Large Language Models (LLMs) can be used to estimate Instructional Support'' domain scores of the CLassroom Assessment Scoring System (CLASS)
We design a machine learning architecture that uses either zero-shot prompting of MetaPearson's Llama2, and/or a classic Bag of Words (BoW) model, to classify individual utterances of teachers' speech.
These utterance-level judgments are aggregated over a 15-min observation session to estimate a global CLASS score.
- Score: 9.51494089949975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, we explore how Large Language Models (LLMs) can be used to estimate ``Instructional Support'' domain scores of the CLassroom Assessment Scoring System (CLASS), a widely used observation protocol. We design a machine learning architecture that uses either zero-shot prompting of Meta's Llama2, and/or a classic Bag of Words (BoW) model, to classify individual utterances of teachers' speech (transcribed automatically using OpenAI's Whisper) for the presence of Instructional Support. Then, these utterance-level judgments are aggregated over a 15-min observation session to estimate a global CLASS score. Experiments on two CLASS-coded datasets of toddler and pre-kindergarten classrooms indicate that (1) automatic CLASS Instructional Support estimation accuracy using the proposed method (Pearson $R$ up to $0.48$) approaches human inter-rater reliability (up to $R=0.55$); (2) LLMs generally yield slightly greater accuracy than BoW for this task, though the best models often combined features extracted from both LLM and BoW; and (3) for classifying individual utterances, there is still room for improvement of automated methods compared to human-level judgments. Finally, (4) we illustrate how the model's outputs can be visualized at the utterance level to provide teachers with explainable feedback on which utterances were most positively or negatively correlated with specific CLASS dimensions.
Related papers
- Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT [7.273857543125784]
Our work explores a multimodal approach to automatically estimating encouragement and warmth in classrooms.
We employed facial and speech emotion recognition with sentiment analysis to extract interpretable features from video, audio, and transcript data.
We demonstrated our approach on the GTI dataset, comprising 367 16-minute video segments from 92 authentic lesson recordings.
arXiv Detail & Related papers (2024-04-01T16:58:09Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Combining EEG and NLP Features for Predicting Students' Lecture
Comprehension using Ensemble Classification [0.7964328411060118]
The proposed framework includes EEG and NLP feature extraction, processing, and classification.
EEG and NLP features are extracted to construct integrated features obtained from recorded EEG signals and sentence-level syntactic analysis.
Experiment results show that this framework performs better than the baselines.
arXiv Detail & Related papers (2023-11-18T14:35:26Z) - FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets [69.91340332545094]
We introduce FLASK, a fine-grained evaluation protocol for both human-based and model-based evaluation.
We experimentally observe that the fine-graininess of evaluation is crucial for attaining a holistic view of model performance.
arXiv Detail & Related papers (2023-07-20T14:56:35Z) - A Global Model Approach to Robust Few-Shot SAR Automatic Target
Recognition [6.260916845720537]
It may not always be possible to collect hundreds of labeled samples per class for training deep learning-based SAR Automatic Target Recognition (ATR) models.
This work specifically tackles the few-shot SAR ATR problem, where only a handful of labeled samples may be available to support the task of interest.
arXiv Detail & Related papers (2023-03-20T00:24:05Z) - CTDS: Centralized Teacher with Decentralized Student for Multi-Agent
Reinforcement Learning [114.69155066932046]
This work proposes a novel.
Teacher with Decentralized Student (C TDS) framework, which consists of a teacher model and a student model.
Specifically, the teacher model allocates the team reward by learning individual Q-values conditioned on global observation.
The student model utilizes the partial observations to approximate the Q-values estimated by the teacher model.
arXiv Detail & Related papers (2022-03-16T06:03:14Z) - Automated Speech Scoring System Under The Lens: Evaluating and
interpreting the linguistic cues for language proficiency [26.70127591966917]
We utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem.
First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses.
In comparison, we find that the regression-based models perform equivalent to or better than the classification approach.
arXiv Detail & Related papers (2021-11-30T06:28:58Z) - CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action
Recognition [52.66360172784038]
We propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually.
We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets.
arXiv Detail & Related papers (2021-01-18T12:46:24Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief
States towards Semi-Supervised Learning [22.757971831442426]
Training belief trackers often requires expensive turn-level annotations of every user utterance.
We propose a probabilistic dialog model, called the LAtent BElief State (LABES) model, where belief states are represented as discrete latent variables.
We introduce LABES-S2S, which is a copy-augmented Seq2Seq model instantiation of LABES.
arXiv Detail & Related papers (2020-09-17T07:26:37Z) - Meta-Learning for Short Utterance Speaker Recognition with Imbalance
Length Pairs [65.28795726837386]
We introduce a meta-learning framework for imbalance length pairs.
We train it with a support set of long utterances and a query set of short utterances of varying lengths.
By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models.
arXiv Detail & Related papers (2020-04-06T17:53:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.