MATT: Multimodal Attention Level Estimation for e-learning Platforms
- URL: http://arxiv.org/abs/2301.09174v1
- Date: Sun, 22 Jan 2023 18:18:20 GMT
- Title: MATT: Multimodal Attention Level Estimation for e-learning Platforms
- Authors: Roberto Daza, Luis F. Gomez, Aythami Morales, Julian Fierrez, Ruben
Tolosana, Ruth Cobos, Javier Ortega-Garcia
- Abstract summary: This work presents a new multimodal system for remote attention level estimation based on multimodal face analysis.
Our multimodal approach uses different parameters and signals obtained from the behavior and physiological processes that have been related to modeling cognitive load.
The mEBAL database is used in the experimental framework, a public multi-modal database for attention level estimation obtained in an e-learning environment.
- Score: 16.407885871027887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents a new multimodal system for remote attention level
estimation based on multimodal face analysis. Our multimodal approach uses
different parameters and signals obtained from the behavior and physiological
processes that have been related to modeling cognitive load such as faces
gestures (e.g., blink rate, facial actions units) and user actions (e.g., head
pose, distance to the camera). The multimodal system uses the following modules
based on Convolutional Neural Networks (CNNs): Eye blink detection, head pose
estimation, facial landmark detection, and facial expression features. First,
we individually evaluate the proposed modules in the task of estimating the
student's attention level captured during online e-learning sessions. For that
we trained binary classifiers (high or low attention) based on Support Vector
Machines (SVM) for each module. Secondly, we find out to what extent multimodal
score level fusion improves the attention level estimation. The mEBAL database
is used in the experimental framework, a public multi-modal database for
attention level estimation obtained in an e-learning environment that contains
data from 38 users while conducting several e-learning tasks of variable
difficulty (creating changes in student cognitive loads).
Related papers
- DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning [18.36413246876648]
This work introduces an innovative method for estimating attention levels (cognitive load) using an ensemble of facial analysis techniques applied to webcam videos.
Our approach adapts state-of-the-art facial analysis technologies to quantify the users' cognitive load in the form of high or low attention.
Our method outperforms existing state-of-the-art accuracies using the public mEBAL2 benchmark.
arXiv Detail & Related papers (2024-08-10T11:39:11Z) - Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression Recognition [6.995226697189459]
We employ a multi-modal self-supervised learning method for facial expression recognition from in-the-wild video data.
Our results generally show that multi-modal self-supervision tasks offer large performance gains for challenging tasks.
We release our pre-trained models as well as source code publicly.
arXiv Detail & Related papers (2024-04-16T20:51:36Z) - Improved Baselines for Data-efficient Perceptual Augmentation of LLMs [66.05826802808177]
In computer vision, large language models (LLMs) can be used to prime vision-language tasks such as image captioning and visual question answering.
We present an experimental evaluation of different interfacing mechanisms, across multiple tasks.
We identify a new interfacing mechanism that yields (near) optimal results across different tasks, while obtaining a 4x reduction in training time.
arXiv Detail & Related papers (2024-03-20T10:57:17Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Expanding Frozen Vision-Language Models without Retraining: Towards
Improved Robot Perception [0.0]
Vision-language models (VLMs) have shown powerful capabilities in visual question answering and reasoning tasks.
In this paper, we demonstrate a method of aligning the embedding spaces of different modalities to the vision embedding space.
We show that using multiple modalities as input improves the VLM's scene understanding and enhances its overall performance in various tasks.
arXiv Detail & Related papers (2023-08-31T06:53:55Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - MOPT: Multi-Object Panoptic Tracking [33.77171216778909]
We introduce a novel perception task denoted as multi-object panoptic tracking (MOPT)
MOPT allows for exploiting pixel-level semantic information of 'thing' and'stuff' classes, temporal coherence, and pixel-level associations over time.
We present extensive quantitative and qualitative evaluations of both vision-based and LiDAR-based MOPT that demonstrate encouraging results.
arXiv Detail & Related papers (2020-04-17T11:45:28Z) - A System for Real-Time Interactive Analysis of Deep Learning Training [66.06880335222529]
Currently available systems are limited to monitoring only the logged data that must be specified before the training process starts.
We present a new system that enables users to perform interactive queries on live processes generating real-time information.
arXiv Detail & Related papers (2020-01-05T11:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.