Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks
- URL: http://arxiv.org/abs/2403.17175v2
- Date: Wed, 02 Oct 2024 19:54:32 GMT
- Title: Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks
- Authors: Ali Abedi, Shehroz S. Khan,
- Abstract summary: This paper introduces a novel, privacy-preserving method for engagement measurement from videos.
It uses facial landmarks, which carry no personally identifiable information, extracted from videos via the MediaPipe deep learning solution.
The proposed method is capable of being deployed on virtual learning platforms and measuring engagement in real-time.
- Score: 2.4343669357792708
- License:
- Abstract: Engagement in virtual learning is crucial for a variety of factors including student satisfaction, performance, and compliance with learning programs, but measuring it is a challenging task. There is therefore considerable interest in utilizing artificial intelligence and affective computing to measure engagement in natural settings as well as on a large scale. This paper introduces a novel, privacy-preserving method for engagement measurement from videos. It uses facial landmarks, which carry no personally identifiable information, extracted from videos via the MediaPipe deep learning solution. The extracted facial landmarks are fed to Spatial-Temporal Graph Convolutional Networks (ST-GCNs) to output the engagement level of the student in the video. To integrate the ordinal nature of the engagement variable into the training process, ST-GCNs undergo training in a novel ordinal learning framework based on transfer learning. Experimental results on two video student engagement measurement datasets show the superiority of the proposed method compared to previous methods with improved state-of-the-art on the EngageNet dataset with a 3.1% improvement in four-class engagement level classification accuracy and on the Online Student Engagement dataset with a 1.5% improvement in binary engagement classification accuracy. Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to the developed ST-GCNs to interpret the engagement measurements obtained by the proposed method in both the spatial and temporal domains. The relatively lightweight and fast ST-GCN and its integration with the real-time MediaPipe make the proposed approach capable of being deployed on virtual learning platforms and measuring engagement in real-time.
Related papers
- Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Bag of States: A Non-sequential Approach to Video-based Engagement
Measurement [7.864500429933145]
Students' behavioral and emotional states need to be analyzed at fine-grained time scales in order to measure their level of engagement.
Many existing approaches have developed sequential andtemporal models, such as recurrent neural networks, temporal convolutional networks, and three-dimensional convolutional neural networks, for measuring student engagement from videos.
We develop bag-of-words-based models in which only occurrence of behavioral and emotional states of students is modeled and analyzed and not the order in which they occur.
arXiv Detail & Related papers (2023-01-17T07:12:34Z) - DcnnGrasp: Towards Accurate Grasp Pattern Recognition with Adaptive
Regularizer Learning [13.08779945306727]
Current state-of-the-art methods ignore category information of objects which is crucial for grasp pattern recognition.
This paper presents a novel dual-branch convolutional neural network (DcnnGrasp) to achieve joint learning of object category classification and grasp pattern recognition.
arXiv Detail & Related papers (2022-05-11T00:34:27Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Contextualized Spatio-Temporal Contrastive Learning with
Self-Supervision [106.77639982059014]
We present ConST-CL framework to effectively learn-temporally fine-grained representations.
We first design a region-based self-supervised task which requires the model to learn to transform instance representations from one view to another guided by context features.
We then introduce a simple design that effectively reconciles the simultaneous learning of both holistic and local representations.
arXiv Detail & Related papers (2021-12-09T19:13:41Z) - Affect-driven Engagement Measurement from Videos [0.8545305424564517]
We present a novel approach for video-based engagement measurement in virtual learning programs.
Deep learning-based temporal, and traditional machine-learning-based non-temporal models are trained and validated.
Our experiments show a state-of-the-art engagement level classification accuracy of 63.3% and correctly classifying disengagement videos.
arXiv Detail & Related papers (2021-06-21T06:49:17Z) - Improving state-of-the-art in Detecting Student Engagement with Resnet
and TCN Hybrid Network [2.2632368327435723]
In this paper, we present a novel end-to-end network architecture for students' engagement level detection in videos.
The 2D ResNet extracts spatial features from consecutive video frames, and the TCN analyzes the temporal changes in video frames to detect the level of engagement.
We compared our method with several competing students' engagement detection methods on this dataset.
arXiv Detail & Related papers (2021-04-20T17:10:13Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z) - MARS: Mixed Virtual and Real Wearable Sensors for Human Activity
Recognition with Multi-Domain Deep Learning Model [21.971345137218886]
We propose to build a large database based on virtual IMUs and then address technical issues by introducing a multiple-domain deep learning framework consisting of three technical parts.
In the first part, we propose to learn the single-frame human activity from the noisy IMU data with hybrid convolutional neural networks (CNNs) in the semi-supervised form.
For the second part, the extracted data features are fused according to the principle of uncertainty-aware consistency.
The transfer learning is performed in the last part based on the newly released Archive of Motion Capture as Surface Shapes (AMASS) dataset.
arXiv Detail & Related papers (2020-09-20T10:35:14Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.