VisioPhysioENet: Multimodal Engagement Detection using Visual and Physiological Signals
- URL: http://arxiv.org/abs/2409.16126v1
- Date: Tue, 24 Sep 2024 14:36:19 GMT
- Title: VisioPhysioENet: Multimodal Engagement Detection using Visual and Physiological Signals
- Authors: Alakhsimar Singh, Nischay Verma, Kanav Goyal, Amritpal Singh, Puneet Kumar, Xiaobai Li,
- Abstract summary: We presentPhysioENet, a novel system that leverages visual cues and physiological signals to detect engagement.
We rigorously evaluate the system on the DAiSEE dataset, where it achieves an accuracy of 63.09%.
- Score: 12.238387391165071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents VisioPhysioENet, a novel multimodal system that leverages visual cues and physiological signals to detect learner engagement. It employs a two-level approach for visual feature extraction using the Dlib library for facial landmark extraction and the OpenCV library for further estimations. This is complemented by extracting physiological signals using the plane-orthogonal-to-skin method to assess cardiovascular activity. These features are integrated using advanced machine learning classifiers, enhancing the detection of various engagement levels. We rigorously evaluate VisioPhysioENet on the DAiSEE dataset, where it achieves an accuracy of 63.09%, demonstrating a superior ability to discern various levels of engagement compared to existing methodologies. The proposed system's code can be accessed at https://github.com/MIntelligence-Group/VisioPhysioENet.
Related papers
- Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.
In this paper, we investigate how detection performance varies across model backbones, types, and datasets.
We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Visual Neural Decoding via Improved Visual-EEG Semantic Consistency [3.4061238650474657]
Methods that directly map EEG features to the CLIP embedding space may introduce mapping bias and cause semantic inconsistency.
We propose a Visual-EEG Semantic Decouple Framework that explicitly extracts the semantic-related features of these two modalities to facilitate optimal alignment.
Our method achieves state-of-the-art results in zero-shot neural decoding tasks.
arXiv Detail & Related papers (2024-08-13T10:16:10Z) - Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Personality Trait Recognition using ECG Spectrograms and Deep Learning [6.6157730528755065]
This paper presents an innovative approach to recognizing personality traits using deep learning (DL) methods applied to electrocardiogram (ECG) signals.
Within the framework of detecting the big five personality traits model encompassing extra-version, neuroticism, agreeableness, conscientiousness, and openness, the research explores the potential of ECG-derived spectrograms as informative features.
arXiv Detail & Related papers (2024-02-06T19:09:44Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Image complexity based fMRI-BOLD visual network categorization across
visual datasets using topological descriptors and deep-hybrid learning [3.522950356329991]
The aim of this study is to examine how network topology differs in response to distinct visual stimuli from visual datasets.
To achieve this, 0- and 1-dimensional persistence diagrams are computed for each visual network representing COCO, ImageNet, and SUN.
The extracted K-means cluster features are fed to a novel deep-hybrid model that yields accuracy in the range of 90%-95% in classifying these visual networks.
arXiv Detail & Related papers (2023-11-03T14:05:57Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - Affinity Feature Strengthening for Accurate, Complete and Robust Vessel
Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.
We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z) - Retinal Structure Detection in OCTA Image via Voting-based Multi-task
Learning [27.637273690432608]
We propose a novel Voting-based Adaptive Feature Fusion multi-task network (VAFF-Net) for joint segmentation, detection, and classification of RV, FAZ, and RVJ.
A task-specific voting gate module is proposed to adaptively extract and fuse different features for specific tasks at two levels.
To facilitate further research, part of these datasets with the source code and evaluation benchmark have been released for public access.
arXiv Detail & Related papers (2022-08-23T05:53:04Z) - An Algorithm for the Labeling and Interactive Visualization of the
Cerebrovascular System of Ischemic Strokes [59.116811751334225]
VirtualDSA++ is an algorithm designed to segment and label the cerebrovascular tree on CTA scans.
We extend the labeling mechanism for the cerebral arteries to identify occluded vessels.
We present the generic concept of iterative systematic search for pathways on all nodes of said model, which enables new interactive features.
arXiv Detail & Related papers (2022-04-26T14:20:26Z) - A Temporal Learning Approach to Inpainting Endoscopic Specularities and
Its effect on Image Correspondence [13.25903945009516]
We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities.
This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner.
We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation.
arXiv Detail & Related papers (2022-03-31T13:14:00Z) - Facial Anatomical Landmark Detection using Regularized Transfer Learning
with Application to Fetal Alcohol Syndrome Recognition [24.27777060287004]
Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies.
Anatomical landmark detection is important to detect the presence of FAS associated facial anomalies.
Current deep learning-based heatmap regression methods designed for facial landmark detection in natural images assume availability of large datasets.
We develop a new regularized transfer learning approach that exploits the knowledge of a network learned on large facial recognition datasets.
arXiv Detail & Related papers (2021-09-12T11:05:06Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Dynamic Graph Modeling of Simultaneous EEG and Eye-tracking Data for
Reading Task Identification [79.41619843969347]
We present a new approach, that we call AdaGTCN, for identifying human reader intent from Electroencephalogram(EEG) and Eye movement(EM) data.
Our method, Adaptive Graph Temporal Convolution Network (AdaGTCN), uses an Adaptive Graph Learning Layer and Deep Neighborhood Graph Convolution Layer.
We compare our approach with several baselines to report an improvement of 6.29% on the ZuCo 2.0 dataset, along with extensive ablation experiments.
arXiv Detail & Related papers (2021-02-21T18:19:49Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Classifying Eye-Tracking Data Using Saliency Maps [8.524684315458245]
This paper proposes a visual saliency based novel feature extraction method for automatic and quantitative classification of eye-tracking data.
Comparing the saliency amplitudes, similarity and dissimilarity of saliency maps with the corresponding eye fixations maps gives an extra dimension of information which is effectively utilized to generate discriminative features to classify the eye-tracking data.
arXiv Detail & Related papers (2020-10-24T15:18:07Z) - Multi-Scale Neural network for EEG Representation Learning in BCI [2.105172041656126]
We propose a novel deep multi-scale neural network that discovers feature representations in multiple frequency/time ranges.
By representing EEG signals withspectral-temporal information, the proposed method can be utilized for diverse paradigms.
arXiv Detail & Related papers (2020-03-02T04:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.