Related papers: DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs

DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs

URL: http://arxiv.org/abs/2204.06454v1
Date: Wed, 13 Apr 2022 15:24:38 GMT
Title: DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs
Authors: Sarthak Batra, Hewei Wang, Avishek Nag, Philippe Brodeur, Marianne Checkley, Annette Klinkert, and Soumyabrata Dev
Abstract summary: Engagement plays a major role in developing intelligent educational interfaces. Non-deep learning models are based on the combination of popular algorithms such as Histogram of Oriented Gradient (HOG), Support Vector Machine (SVM), Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF) The deep learning methods include Densely Connected Convolutional Networks (DenseNet-121), Residual Network (ResNet-18) and MobileNetV1.
Score: 0.4397520291340695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Engagement is an essential indicator of the Quality-of-Learning Experience (QoLE) and plays a major role in developing intelligent educational interfaces. The number of people learning through Massively Open Online Courses (MOOCs) and other online resources has been increasing rapidly because they provide us with the flexibility to learn from anywhere at any time. This provides a good learning experience for the students. However, such learning interface requires the ability to recognize the level of engagement of the students for a holistic learning experience. This is useful for both students and educators alike. However, understanding engagement is a challenging task, because of its subjectivity and ability to collect data. In this paper, we propose a variety of models that have been trained on an open-source dataset of video screengrabs. Our non-deep learning models are based on the combination of popular algorithms such as Histogram of Oriented Gradient (HOG), Support Vector Machine (SVM), Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF). The deep learning methods include Densely Connected Convolutional Networks (DenseNet-121), Residual Network (ResNet-18) and MobileNetV1. We show the performance of each models using a variety of metrics such as the Gini Index, Adjusted F-Measure (AGF), and Area Under receiver operating characteristic Curve (AUC). We use various dimensionality reduction techniques such as Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) to understand the distribution of data in the feature sub-space. Our work will thereby assist the educators and students in obtaining a fruitful and efficient online learning experience.

Related papers

Multi-modal Knowledge Distillation-based Human Trajectory Forecasting [35.060041571520024]
Pedestrian trajectory forecasting is crucial in various applications such as autonomous driving and mobile robot navigation. In such applications, camera-based perception enables the extraction of additional modalities (human pose, text) to enhance prediction accuracy. We propose a multi-modal knowledge distillation framework: a student model with limited modality is distilled from a teacher model trained with full range of modalities.
arXiv Detail & Related papers (2025-03-28T07:32:51Z)
Reinforcement Learning Based Multi-modal Feature Fusion Network for Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans. We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information. We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z)
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks. We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z)
Exploring Effective Factors for Improving Visual In-Context Learning [56.14208975380607]
In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. This paper shows that prompt selection and prompt fusion are two major factors that have a direct impact on the inference performance of visual context learning. We propose a simple framework prompt-SelF for visual in-context learning.
arXiv Detail & Related papers (2023-04-10T17:59:04Z)
Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device) In FL, each data holder trains a model locally and releases it to a central server for aggregation. In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation). In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z)
Learnable Graph Convolutional Network and Feature Fusion for Multi-view Learning [30.74535386745822]
This paper proposes a joint deep learning framework called Learnable Graph Convolutional Network and Feature Fusion (LGCN-FF) It consists of two stages: feature fusion network and learnable graph convolutional network. The proposed LGCN-FF is validated to be superior to various state-of-the-art methods in multi-view semi-supervised classification.
arXiv Detail & Related papers (2022-11-16T19:07:12Z)
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation [71.51719469058666]
We propose a representation learning framework called X-Learner. X-Learner learns the universal feature of multiple vision tasks supervised by various sources. X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs.
arXiv Detail & Related papers (2022-03-16T17:23:26Z)
Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation. We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner. Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z)
Student sentiment Analysis Using Classification With Feature Extraction Techniques [0.0]
This paper describes the web-based learning and their effectiveness towards students. We worked on how machine learning techniques like Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT)
arXiv Detail & Related papers (2021-02-01T18:48:06Z)
Sense and Learn: Self-Supervision for Omnipresent Sensors [9.442811508809994]
We present a framework named Sense and Learn for representation or feature learning from raw sensory data. It consists of several auxiliary tasks that can learn high-level and broadly useful features entirely from unannotated data without any human involvement in the tedious labeling process. Our methodology achieves results that are competitive with the supervised approaches and close the gap through fine-tuning a network while learning the downstream tasks in most cases.
arXiv Detail & Related papers (2020-09-28T11:57:43Z)
Analyzing Student Strategies In Blended Courses Using Clickstream Data [32.81171098036632]
We use pattern mining and models borrowed from Natural Language Processing to understand student interactions. Fine-grained clickstream data is collected through Diderot, a non-commercial educational support system. Our results suggest that the proposed hybrid NLP methods can provide valuable insights even in the low-data setting of blended courses.
arXiv Detail & Related papers (2020-05-31T03:01:00Z)
Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network. Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.