CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis
in the Wild
- URL: http://arxiv.org/abs/2306.15073v1
- Date: Mon, 26 Jun 2023 21:20:23 GMT
- Title: CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis
in the Wild
- Authors: Li Ding, Jack Terwilliger, Aishni Parab, Meng Wang, Lex Fridman, Bruce
Mehler, Bryan Reimer
- Abstract summary: Real-time analysis of the dynamics of the eye region allows us to monitor humans' visual attention allocation and estimate their mental state.
We propose CLERA, which achieves precise keypoint detection andtemporal tracking in a joint-learning framework.
We also introduce a large-scale dataset of 30k human faces with joint pupil, eye-openness, and landmark annotation.
- Score: 18.79132232751083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-intrusive, real-time analysis of the dynamics of the eye region allows us
to monitor humans' visual attention allocation and estimate their mental state
during the performance of real-world tasks, which can potentially benefit a
wide range of human-computer interaction (HCI) applications. While commercial
eye-tracking devices have been frequently employed, the difficulty of
customizing these devices places unnecessary constraints on the exploration of
more efficient, end-to-end models of eye dynamics. In this work, we propose
CLERA, a unified model for Cognitive Load and Eye Region Analysis, which
achieves precise keypoint detection and spatiotemporal tracking in a
joint-learning framework. Our method demonstrates significant efficiency and
outperforms prior work on tasks including cognitive load estimation, eye
landmark detection, and blink estimation. We also introduce a large-scale
dataset of 30k human faces with joint pupil, eye-openness, and landmark
annotation, which aims to support future HCI research on human factors and
eye-related analysis.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning [18.36413246876648]
This work introduces an innovative method for estimating attention levels (cognitive load) using an ensemble of facial analysis techniques applied to webcam videos.
Our approach adapts state-of-the-art facial analysis technologies to quantify the users' cognitive load in the form of high or low attention.
Our method outperforms existing state-of-the-art accuracies using the public mEBAL2 benchmark.
arXiv Detail & Related papers (2024-08-10T11:39:11Z) - Using Deep Learning to Increase Eye-Tracking Robustness, Accuracy, and Precision in Virtual Reality [2.2639735235640015]
This work provides an objective assessment of the impact of several contemporary machine learning (ML)-based methods for eye feature tracking.
Metrics include the accuracy and precision of the gaze estimate, as well as drop-out rate.
arXiv Detail & Related papers (2024-03-28T18:43:25Z) - Multimodal Adaptive Fusion of Face and Gait Features using Keyless
attention based Deep Neural Networks for Human Identification [67.64124512185087]
Soft biometrics such as gait are widely used with face in surveillance tasks like person recognition and re-identification.
We propose a novel adaptive multi-biometric fusion strategy for the dynamic incorporation of gait and face biometric cues by leveraging keyless attention deep neural networks.
arXiv Detail & Related papers (2023-03-24T05:28:35Z) - TMHOI: Translational Model for Human-Object Interaction Detection [18.804647133922195]
We propose an innovative graph-based approach to detect human-object interactions (HOIs)
Our method effectively captures the sentiment representation of HOIs by integrating both spatial and semantic knowledge.
Our approach outperformed existing state-of-the-art graph-based methods by a significant margin.
arXiv Detail & Related papers (2023-03-07T21:52:10Z) - Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models [6.642042615005632]
Eye-tracking has potential to provide rich behavioral data about human cognition in ecologically valid environments.
This paper studies using computer vision tools for "attention decoding", the task of assessing the locus of a participant's overt visual attention over time.
arXiv Detail & Related papers (2022-11-20T12:24:57Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - A Deep Learning Approach for the Segmentation of Electroencephalography
Data in Eye Tracking Applications [56.458448869572294]
We introduce DETRtime, a novel framework for time-series segmentation of EEG data.
Our end-to-end deep learning-based framework brings advances in Computer Vision to the forefront.
Our model generalizes well in the task of EEG sleep stage segmentation.
arXiv Detail & Related papers (2022-06-17T10:17:24Z) - Automatic Gaze Analysis: A Survey of DeepLearning based Approaches [61.32686939754183]
Eye gaze analysis is an important research problem in the field of computer vision and Human-Computer Interaction.
There are several open questions including what are the important cues to interpret gaze direction in an unconstrained environment.
We review the progress across a range of gaze analysis tasks and applications to shed light on these fundamental questions.
arXiv Detail & Related papers (2021-08-12T00:30:39Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.