Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning
- URL: http://arxiv.org/abs/2502.10813v1
- Date: Sat, 15 Feb 2025 14:37:09 GMT
- Title: Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning
- Authors: Sandeep Mandia, Kuldeep Singh, Rajendra Mitharwal, Faisel Mushtaq, Dimpal Janu,
- Abstract summary: This paper proposes EngageFormer, a transformer based architecture with sequence pooling using video modality for engagement classification.
The proposed architecture computes three views from the input video and processes them in parallel using transformer encoders.
A learning centered affective state dataset is curated from existing open source databases.
- Score: 2.127312905562737
- License:
- Abstract: The COVID-19 pandemic and the internet's availability have recently boosted online learning. However, monitoring engagement in online learning is a difficult task for teachers. In this context, timely automatic student engagement classification can help teachers in making adaptive adjustments to meet students' needs. This paper proposes EngageFormer, a transformer based architecture with sequence pooling using video modality for engagement classification. The proposed architecture computes three views from the input video and processes them in parallel using transformer encoders; the global encoder then processes the representation from each encoder, and finally, multi layer perceptron (MLP) predicts the engagement level. A learning centered affective state dataset is curated from existing open source databases. The proposed method achieved an accuracy of 63.9%, 56.73%, 99.16%, 65.67%, and 74.89% on Dataset for Affective States in E-Environments (DAiSEE), Bahcesehir University Multimodal Affective Database-1 (BAUM-1), Yawning Detection Dataset (YawDD), University of Texas at Arlington Real-Life Drowsiness Dataset (UTA-RLDD), and curated learning-centered affective state dataset respectively. The achieved results on the BAUM-1, DAiSEE, and YawDD datasets demonstrate state-of-the-art performance, indicating the superiority of the proposed model in accurately classifying affective states on these datasets. Additionally, the results obtained on the UTA-RLDD dataset, which involves two-class classification, serve as a baseline for future research. These results provide a foundation for further investigations and serve as a point of reference for future works to compare and improve upon.
Related papers
- Reflective Teacher: Semi-Supervised Multimodal 3D Object Detection in Bird's-Eye-View via Uncertainty Measure [5.510678909146336]
We introduce a novel concept of Reflective Teacher where the student is trained by both labeled and pseudo labeled data.
We also propose Geometry Aware BEV Fusion (GA-BEV) for efficient alignment of multi-modal BEV features.
arXiv Detail & Related papers (2024-12-05T16:54:39Z) - Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks [2.4343669357792708]
This paper introduces a novel, privacy-preserving method for engagement measurement from videos.
It uses facial landmarks, which carry no personally identifiable information, extracted from videos via the MediaPipe deep learning solution.
The proposed method is capable of being deployed on virtual learning platforms and measuring engagement in real-time.
arXiv Detail & Related papers (2024-03-25T20:43:23Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Data Quality in Imitation Learning [15.939363481618738]
In offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity.
This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations.
In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift.
arXiv Detail & Related papers (2023-06-04T18:48:32Z) - Self-Supervised Representation Learning from Temporal Ordering of
Automated Driving Sequences [49.91741677556553]
We propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks.
We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems.
Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods.
arXiv Detail & Related papers (2023-02-17T18:18:27Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.