Improving state-of-the-art in Detecting Student Engagement with Resnet
and TCN Hybrid Network
- URL: http://arxiv.org/abs/2104.10122v1
- Date: Tue, 20 Apr 2021 17:10:13 GMT
- Title: Improving state-of-the-art in Detecting Student Engagement with Resnet
and TCN Hybrid Network
- Authors: Ali Abedi and Shehroz S. Khan
- Abstract summary: In this paper, we present a novel end-to-end network architecture for students' engagement level detection in videos.
The 2D ResNet extracts spatial features from consecutive video frames, and the TCN analyzes the temporal changes in video frames to detect the level of engagement.
We compared our method with several competing students' engagement detection methods on this dataset.
- Score: 2.2632368327435723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic detection of students' engagement in online learning settings is a
key element to improve the quality of learning and to deliver personalized
learning materials to them. Varying levels of engagement exhibited by students
in an online classroom is an affective behavior that takes place over space and
time. Therefore, we formulate detecting levels of students' engagement from
videos as a spatio-temporal classification problem. In this paper, we present a
novel end-to-end Residual Network (ResNet) and Temporal Convolutional Network
(TCN) hybrid neural network architecture for students' engagement level
detection in videos. The 2D ResNet extracts spatial features from consecutive
video frames, and the TCN analyzes the temporal changes in video frames to
detect the level of engagement. The spatial and temporal arms of the hybrid
network are jointly trained on raw video frames of a large publicly available
students' engagement detection dataset, DAiSEE. We compared our method with
several competing students' engagement detection methods on this dataset. The
ResNet+TCN architecture outperforms all other studied methods, improves the
state-of-the-art engagement level detection accuracy, and sets a new baseline
for future research.
Related papers
- Study of the effect of Sharpness on Blind Video Quality Assessment [0.0]
This study explores the sharpness effect on models like BVQA.
Sharpness is the measure of the clarity and details of the video image.
This study uses the existing video quality databases such as CVD2014.
arXiv Detail & Related papers (2024-04-06T16:10:48Z) - Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks [2.4343669357792708]
This paper introduces a novel, privacy-preserving method for engagement measurement from videos.
It uses facial landmarks, which carry no personally identifiable information, extracted from videos via the MediaPipe deep learning solution.
The proposed method is capable of being deployed on virtual learning platforms and measuring engagement in real-time.
arXiv Detail & Related papers (2024-03-25T20:43:23Z) - ASF-Net: Robust Video Deraining via Temporal Alignment and Online
Adaptive Learning [47.10392889695035]
We propose a new computational paradigm, Alignment-Shift-Fusion Network (ASF-Net), which incorporates a temporal shift module.
We construct a LArge-scale RAiny video dataset (LARA) which supports the development of this community.
Our proposed approach exhibits superior performance in three benchmarks and compelling visual quality in real-world scenarios.
arXiv Detail & Related papers (2023-09-02T14:50:13Z) - Balancing Accuracy and Training Time in Federated Learning for Violence
Detection in Surveillance Videos: A Study of Neural Network Architectures [0.0]
The study includes experiments with-temporal detection features extracted from benchmark video datasets.
Various machine learning techniques, including super-convergence and transfer learning, are explored.
The research achieves better accuracy results compared to state-of-the-art models by training the best violence detection model in a federated learning context.
arXiv Detail & Related papers (2023-06-29T19:44:02Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [55.088635195893325]
We propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ)
HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos.
Experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods.
arXiv Detail & Related papers (2022-02-07T18:04:10Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Self-Supervised Adaptation for Video Super-Resolution [7.26562478548988]
Single-image super-resolution (SISR) networks can adapt their network parameters to specific input images.
We present a new learning algorithm that allows conventional video super-resolution (VSR) networks to adapt their parameters to test video frames.
arXiv Detail & Related papers (2021-03-18T08:30:24Z) - Fully Convolutional Networks for Continuous Sign Language Recognition [83.85895472824221]
Continuous sign language recognition is a challenging task that requires learning on both spatial and temporal dimensions.
We propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences.
arXiv Detail & Related papers (2020-07-24T08:16:37Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.