Related papers: Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification

Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification

URL: http://arxiv.org/abs/2601.06394v1
Date: Sat, 10 Jan 2026 02:39:24 GMT
Title: Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification
Authors: Ahmed Abdelkawy, Ahmed Elsayed, Asem Ali, Aly Farag, Thomas Tretter, Michael McIntyre,
Abstract summary: We propose a novel three-stage framework for video-based student engagement measurement.<n>First, we explore the few-shot adaptation of the vision-language model for student action recognition.<n>Second, we utilize the sliding temporal window technique to divide each student's 2-minute-long video into non-overlapping segments.<n>Third, we leverage the large language model to classify this entire sequence of actions, together with the classroom context, as belonging to an engaged or disengaged student.
Score: 0.6103775976356991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding student behavior in the classroom is essential to improve both pedagogical quality and student engagement. Existing methods for predicting student engagement typically require substantial annotated data to model the diversity of student behaviors, yet privacy concerns often restrict researchers to their own proprietary datasets. Moreover, the classroom context, represented in peers' actions, is ignored. To address the aforementioned limitation, we propose a novel three-stage framework for video-based student engagement measurement. First, we explore the few-shot adaptation of the vision-language model for student action recognition, which is fine-tuned to distinguish among action categories with a few training samples. Second, to handle continuous and unpredictable student actions, we utilize the sliding temporal window technique to divide each student's 2-minute-long video into non-overlapping segments. Each segment is assigned an action category via the fine-tuned VLM model, generating a sequence of action predictions. Finally, we leverage the large language model to classify this entire sequence of actions, together with the classroom context, as belonging to an engaged or disengaged student. The experimental results demonstrate the effectiveness of the proposed approach in identifying student engagement.

Related papers

Supervised Contrastive Learning for Ordinal Engagement Measurement [2.166000001057538]
Student engagement plays a crucial role in the successful delivery of educational programs.<n>This paper identifies two key challenges in this problem: class imbalance and incorporating order into engagement levels.<n>A novel approach to video-based student engagement measurement in virtual learning environments is proposed.
arXiv Detail & Related papers (2025-05-27T03:49:45Z)
ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation [66.8640112000444]
Temporal action segmentation and long-term action anticipation are popular vision tasks for the temporal analysis of actions in videos.<n>We tackle these two problems, action segmentation and action anticipation, jointly using a unified diffusion model dubbed ActFusion.<n>We introduce a new anticipative masking strategy during training in which a late part of the video frames is masked as invisible, and learnable tokens replace these frames to learn to predict the invisible future.
arXiv Detail & Related papers (2024-12-05T17:12:35Z)
PALM: Predicting Actions through Language Models [74.10147822693791]
We introduce PALM, an approach that tackles the task of long-term action anticipation. Our method incorporates an action recognition model to track previous action sequences and a vision-language model to articulate relevant environmental details. Our experimental results demonstrate that PALM surpasses the state-of-the-art methods in the task of long-term action anticipation.
arXiv Detail & Related papers (2023-11-29T02:17:27Z)
Measuring Student Behavioral Engagement using Histogram of Actions [0.0]
The proposed approach recognizes student actions then predicts the student behavioral engagement level.<n>For student action recognition, we use human skeletons to model student postures and upper body movements.<n>The trained 3D-CNN model is used to recognize actions within every 2minute video segment.
arXiv Detail & Related papers (2023-07-18T16:37:37Z)
Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization [98.66318678030491]
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training. We propose a novel Proposal-based Multiple Instance Learning (P-MIL) framework that directly classifies the candidate proposals in both the training and testing stages.
arXiv Detail & Related papers (2023-05-29T02:48:04Z)
Bag of States: A Non-sequential Approach to Video-based Engagement Measurement [7.864500429933145]
Students' behavioral and emotional states need to be analyzed at fine-grained time scales in order to measure their level of engagement. Many existing approaches have developed sequential andtemporal models, such as recurrent neural networks, temporal convolutional networks, and three-dimensional convolutional neural networks, for measuring student engagement from videos. We develop bag-of-words-based models in which only occurrence of behavioral and emotional states of students is modeled and analyzed and not the order in which they occur.
arXiv Detail & Related papers (2023-01-17T07:12:34Z)
Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models. In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks. Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z)
Detecting Disengagement in Virtual Learning as an Anomaly [4.706263507340607]
Student engagement is an important factor in meeting the goals of virtual learning programs. In this paper, we formulate detecting disengagement in virtual learning as an anomaly detection problem. We design various autoencoders, including temporal convolutional network autoencoder, long-short-term memory autoencoder.
arXiv Detail & Related papers (2022-11-13T10:29:25Z)
Mitigating Biases in Student Performance Prediction via Attention-Based Personalized Federated Learning [7.040747348755578]
Traditional learning-based approaches to student modeling generalize poorly to underrepresented student groups due to biases in data availability. We propose a methodology for predicting student performance from their online learning activities that optimize inference accuracy over different demographic groups such as race and gender.
arXiv Detail & Related papers (2022-08-02T00:22:20Z)
Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp. SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z)
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos [76.21297023629589]
We propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos. Our method turns out to achieve state-of-the-art performances on four standard benchmark datasets.
arXiv Detail & Related papers (2020-07-28T12:40:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.