Multimodal Engagement Analysis from Facial Videos in the Classroom
- URL: http://arxiv.org/abs/2101.04215v2
- Date: Fri, 22 Jan 2021 18:53:32 GMT
- Title: Multimodal Engagement Analysis from Facial Videos in the Classroom
- Authors: \"Omer S\"umer, Patricia Goldberg, Sidney D'Mello, Peter Gerjets,
Ulrich Trautwein, Enkelejda Kasneci
- Abstract summary: The aim of this work is to provide the technical means to facilitate the manual data analysis of classroom videos in research on teaching quality and in the context of teacher training.
- Score: 5.202558003704116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Student engagement is a key construct for learning and teaching. While most
of the literature explored the student engagement analysis on computer-based
settings, this paper extends that focus to classroom instruction. To best
examine student visual engagement in the classroom, we conducted a study
utilizing the audiovisual recordings of classes at a secondary school over one
and a half month's time, acquired continuous engagement labeling per student
(N=15) in repeated sessions, and explored computer vision methods to classify
engagement levels from faces in the classroom. We trained deep embeddings for
attentional and emotional features, training Attention-Net for head pose
estimation and Affect-Net for facial expression recognition. We additionally
trained different engagement classifiers, consisting of Support Vector
Machines, Random Forest, Multilayer Perceptron, and Long Short-Term Memory, for
both features. The best performing engagement classifiers achieved AUCs of .620
and .720 in Grades 8 and 12, respectively. We further investigated fusion
strategies and found score-level fusion either improves the engagement
classifiers or is on par with the best performing modality. We also
investigated the effect of personalization and found that using only 60-seconds
of person-specific data selected by margin uncertainty of the base classifier
yielded an average AUC improvement of .084. 4.Our main aim with this work is to
provide the technical means to facilitate the manual data analysis of classroom
videos in research on teaching quality and in the context of teacher training.
Related papers
- Multimodality in Online Education: A Comparative Study [2.0472158451829827]
Current systems consider only a single cue with a lack of focus in the educational domain.
This paper highlights the need for a multimodal approach to affect recognition and its deployment in the online classroom.
It compares the various machine learning models available for each cue and provides the most suitable approach.
arXiv Detail & Related papers (2023-12-10T07:12:15Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Detecting Disengagement in Virtual Learning as an Anomaly [4.706263507340607]
Student engagement is an important factor in meeting the goals of virtual learning programs.
In this paper, we formulate detecting disengagement in virtual learning as an anomaly detection problem.
We design various autoencoders, including temporal convolutional network autoencoder, long-short-term memory autoencoder.
arXiv Detail & Related papers (2022-11-13T10:29:25Z) - Unsupervised Audio-Visual Lecture Segmentation [31.29084124332193]
We introduce AVLectures, a dataset consisting of 86 courses with over 2,350 lectures covering various STEM subjects.
Our second contribution is introducing video lecture segmentation that splits lectures into bite-sized topics that show promise in improving learner engagement.
We use these representations to generate segments using a temporally consistent 1-nearest neighbor algorithm, TW-FINCH.
arXiv Detail & Related papers (2022-10-29T16:26:34Z) - Hierarchical Self-supervised Representation Learning for Movie
Understanding [24.952866206036536]
We propose a novel hierarchical self-supervised pretraining strategy that separately pretrains each level of our hierarchical movie understanding model.
Specifically, we propose to pretrain the low-level video backbone using a contrastive learning objective, while pretrain the higher-level video contextualizer using an event mask prediction task.
We first show that our self-supervised pretraining strategies are effective and lead to improved performance on all tasks and metrics on VidSitu benchmark [37] (e.g., improving on semantic role prediction from 47% to 61% CIDEr scores)
arXiv Detail & Related papers (2022-04-06T21:28:41Z) - vCLIMB: A Novel Video Class Incremental Learning Benchmark [53.90485760679411]
We introduce vCLIMB, a novel video continual learning benchmark.
vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning.
We propose a temporal consistency regularization that can be applied on top of memory-based continual learning methods.
arXiv Detail & Related papers (2022-01-23T22:14:17Z) - Seminar Learning for Click-Level Weakly Supervised Semantic Segmentation [149.9226057885554]
We propose seminar learning, a new learning paradigm for semantic segmentation with click-level supervision.
The rationale of seminar learning is to leverage the knowledge from different networks to compensate for insufficient information provided in click-level annotations.
Experimental results demonstrate the effectiveness of seminar learning, which achieves the new state-of-the-art performance of 72.51%.
arXiv Detail & Related papers (2021-08-30T17:27:43Z) - Class-Balanced Distillation for Long-Tailed Visual Recognition [100.10293372607222]
Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions.
In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting.
Our main contribution is a new training method, that leverages knowledge distillation to enhance feature representations.
arXiv Detail & Related papers (2021-04-12T08:21:03Z) - Privileged Knowledge Distillation for Online Action Detection [114.5213840651675]
Online Action Detection (OAD) in videos is proposed as a per-frame labeling task to address the real-time prediction tasks.
This paper presents a novel learning-with-privileged based framework for online action detection where the future frames only observable at the training stages are considered as a form of privileged information.
arXiv Detail & Related papers (2020-11-18T08:52:15Z) - Memory-augmented Dense Predictive Coding for Video Representation
Learning [103.69904379356413]
We propose a new architecture and learning framework Memory-augmented Predictive Coding (MemDPC) for the task.
We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both.
In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.
arXiv Detail & Related papers (2020-08-03T17:57:01Z) - Predicting Engagement in Video Lectures [24.415345855402624]
We introduce a novel, large dataset of video lectures for predicting context-agnostic engagement.
We propose both cross-modal and modality-specific feature sets to achieve this task.
We demonstrate the use of our approach in the case of data scarcity.
arXiv Detail & Related papers (2020-05-31T19:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.