Affect-driven Engagement Measurement from Videos
- URL: http://arxiv.org/abs/2106.10882v1
- Date: Mon, 21 Jun 2021 06:49:17 GMT
- Title: Affect-driven Engagement Measurement from Videos
- Authors: Ali Abedi and Shehroz Khan
- Abstract summary: We present a novel approach for video-based engagement measurement in virtual learning programs.
Deep learning-based temporal, and traditional machine-learning-based non-temporal models are trained and validated.
Our experiments show a state-of-the-art engagement level classification accuracy of 63.3% and correctly classifying disengagement videos.
- Score: 0.8545305424564517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In education and intervention programs, person's engagement has been
identified as a major factor in successful program completion. Automatic
measurement of person's engagement provides useful information for instructors
to meet program objectives and individualize program delivery. In this paper,
we present a novel approach for video-based engagement measurement in virtual
learning programs. We propose to use affect states, continuous values of
valence and arousal extracted from consecutive video frames, along with a new
latent affective feature vector and behavioral features for engagement
measurement. Deep learning-based temporal, and traditional
machine-learning-based non-temporal models are trained and validated on
frame-level, and video-level features, respectively. In addition to the
conventional centralized learning, we also implement the proposed method in a
decentralized federated learning setting and study the effect of model
personalization in engagement measurement. We evaluated the performance of the
proposed method on the only two publicly available video engagement measurement
datasets, DAiSEE and EmotiW, containing videos of students in online learning
programs. Our experiments show a state-of-the-art engagement level
classification accuracy of 63.3% and correctly classifying disengagement videos
in the DAiSEE dataset and a regression mean squared error of 0.0673 on the
EmotiW dataset. Our ablation study shows the effectiveness of incorporating
affect states in engagement measurement. We interpret the findings from the
experimental results based on psychology concepts in the field of engagement.
Related papers
- Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [94.13848736705575]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms.
We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels.
Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z) - Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks [2.4343669357792708]
This paper introduces a novel, privacy-preserving method for engagement measurement from videos.
It uses facial landmarks, which carry no personally identifiable information, extracted from videos via the MediaPipe deep learning solution.
The proposed method is capable of being deployed on virtual learning platforms and measuring engagement in real-time.
arXiv Detail & Related papers (2024-03-25T20:43:23Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips.
We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z) - Bag of States: A Non-sequential Approach to Video-based Engagement
Measurement [7.864500429933145]
Students' behavioral and emotional states need to be analyzed at fine-grained time scales in order to measure their level of engagement.
Many existing approaches have developed sequential andtemporal models, such as recurrent neural networks, temporal convolutional networks, and three-dimensional convolutional neural networks, for measuring student engagement from videos.
We develop bag-of-words-based models in which only occurrence of behavioral and emotional states of students is modeled and analyzed and not the order in which they occur.
arXiv Detail & Related papers (2023-01-17T07:12:34Z) - Detecting Disengagement in Virtual Learning as an Anomaly [4.706263507340607]
Student engagement is an important factor in meeting the goals of virtual learning programs.
In this paper, we formulate detecting disengagement in virtual learning as an anomaly detection problem.
We design various autoencoders, including temporal convolutional network autoencoder, long-short-term memory autoencoder.
arXiv Detail & Related papers (2022-11-13T10:29:25Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training.
We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss.
We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z) - Memory-augmented Dense Predictive Coding for Video Representation
Learning [103.69904379356413]
We propose a new architecture and learning framework Memory-augmented Predictive Coding (MemDPC) for the task.
We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both.
In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.
arXiv Detail & Related papers (2020-08-03T17:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.