DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning
- URL: http://arxiv.org/abs/2408.05523v2
- Date: Wed, 14 Aug 2024 14:34:34 GMT
- Title: DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning
- Authors: Roberto Daza, Luis F. Gomez, Julian Fierrez, Aythami Morales, Ruben Tolosana, Javier Ortega-Garcia,
- Abstract summary: This work introduces an innovative method for estimating attention levels (cognitive load) using an ensemble of facial analysis techniques applied to webcam videos.
Our approach adapts state-of-the-art facial analysis technologies to quantify the users' cognitive load in the form of high or low attention.
Our method outperforms existing state-of-the-art accuracies using the public mEBAL2 benchmark.
- Score: 18.36413246876648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work introduces an innovative method for estimating attention levels (cognitive load) using an ensemble of facial analysis techniques applied to webcam videos. Our method is particularly useful, among others, in e-learning applications, so we trained, evaluated, and compared our approach on the mEBAL2 database, a public multi-modal database acquired in an e-learning environment. mEBAL2 comprises data from 60 users who performed 8 different tasks. These tasks varied in difficulty, leading to changes in their cognitive loads. Our approach adapts state-of-the-art facial analysis technologies to quantify the users' cognitive load in the form of high or low attention. Several behavioral signals and physiological processes related to the cognitive load are used, such as eyeblink, heart rate, facial action units, and head pose, among others. Furthermore, we conduct a study to understand which individual features obtain better results, the most efficient combinations, explore local and global features, and how temporary time intervals affect attention level estimation, among other aspects. We find that global facial features are more appropriate for multimodal systems using score-level fusion, particularly as the temporal window increases. On the other hand, local features are more suitable for fusion through neural network training with score-level fusion approaches. Our method outperforms existing state-of-the-art accuracies using the public mEBAL2 benchmark.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Multimodal Adaptive Fusion of Face and Gait Features using Keyless
attention based Deep Neural Networks for Human Identification [67.64124512185087]
Soft biometrics such as gait are widely used with face in surveillance tasks like person recognition and re-identification.
We propose a novel adaptive multi-biometric fusion strategy for the dynamic incorporation of gait and face biometric cues by leveraging keyless attention deep neural networks.
arXiv Detail & Related papers (2023-03-24T05:28:35Z) - MATT: Multimodal Attention Level Estimation for e-learning Platforms [16.407885871027887]
This work presents a new multimodal system for remote attention level estimation based on multimodal face analysis.
Our multimodal approach uses different parameters and signals obtained from the behavior and physiological processes that have been related to modeling cognitive load.
The mEBAL database is used in the experimental framework, a public multi-modal database for attention level estimation obtained in an e-learning environment.
arXiv Detail & Related papers (2023-01-22T18:18:20Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Facial Emotion Recognition using Deep Residual Networks in Real-World
Environments [5.834678345946704]
We propose a facial feature extractor model trained on an in-the-wild and massively collected video dataset.
The dataset consists of a million labelled frames and 2,616 thousand subjects.
As temporal information is important to the emotion recognition domain, we utilise LSTM cells to capture the temporal dynamics in the data.
arXiv Detail & Related papers (2021-11-04T10:08:22Z) - Towards a Real-Time Facial Analysis System [13.649384403827359]
We present a system-level design of a real-time facial analysis system.
With a collection of deep neural networks for object detection, classification, and regression, the system recognizes age, gender, facial expression, and facial similarity for each person that appears in the camera view.
Results on common off-the-shelf architecture show that the system's accuracy is comparable to the state-of-the-art methods, and the recognition speed satisfies real-time requirements.
arXiv Detail & Related papers (2021-09-21T18:27:15Z) - Attentive Cross-modal Connections for Deep Multimodal Wearable-based
Emotion Recognition [7.559720049837459]
We present a novel attentive cross-modal connection to share information between convolutional neural networks.
Specifically, these connections improve emotion classification by sharing intermediate representations among EDA and ECG.
Our experiments show that the proposed approach is capable of learning strong multimodal representations and outperforms a number of baselines methods.
arXiv Detail & Related papers (2021-08-04T18:40:32Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.