Related papers: Combining Facial Videos and Biosignals for Stress Estimation During Driving

Combining Facial Videos and Biosignals for Stress Estimation During Driving

URL: http://arxiv.org/abs/2601.04376v2
Date: Sat, 10 Jan 2026 18:23:31 GMT
Title: Combining Facial Videos and Biosignals for Stress Estimation During Driving
Authors: Paraskevi Valergaki, Vassilis C. Nicodemou, Iason Oikonomidis, Antonis Argyros, Anastasios Roussos,
Abstract summary: Stress is commonly detected using physiological signals such as perinasal perspiration and heart rate.<n>We propose a multimodal stress estimation framework that combines facial videos and physiological signals.<n>Although evaluated on driving data, the proposed framework and protocol may generalize to other stress estimation settings.
Score: 4.551432404727517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliable stress recognition is critical in applications such as medical monitoring and safety-critical systems, including real-world driving. While stress is commonly detected using physiological signals such as perinasal perspiration and heart rate, facial activity provides complementary cues that can be captured unobtrusively from video. We propose a multimodal stress estimation framework that combines facial videos and physiological signals, remaining effective even when biosignal acquisition is challenging. Facial behavior is represented using a dense 3D Morphable Model, yielding a 56-dimensional descriptor that captures subtle expression and head-pose dynamics over time. To study how stress modulates facial motion, we perform extensive experiments alongside established physiological markers. Paired hypothesis tests between baseline and stressor phases show that 38 of 56 facial components exhibit consistent, phase-specific stress responses comparable to physiological markers. Building on these findings, we introduce a Transformer-based temporal modeling framework and evaluate unimodal, early-fusion, and cross-modal attention strategies. Cross-modal attention fusion of 3D-derived facial features with physiological signals substantially improves performance over physiological signals alone, increasing AUROC from 52.7% and accuracy from 51.0% to 92.0% and 86.7%, respectively. Although evaluated on driving data, the proposed framework and protocol may generalize to other stress estimation settings.

Related papers

Dynamic Stress Detection: A Study of Temporal Progression Modelling of Stress in Speech [1.3320917259299652]
We model stress as a temporally evolving phenomenon influenced by historical emotional state.<n>We propose a dynamic labelling strategy that fine-grained stress annotations from emotional labels.<n>Our approach achieves notable accuracy gains on MuSE and StressID over existing baselines.
arXiv Detail & Related papers (2025-10-02T06:30:44Z)
CAST-Phys: Contactless Affective States Through Physiological signals Database [74.28082880875368]
The lack of affective multi-modal datasets remains a major bottleneck in developing accurate emotion recognition systems.<n>We present the Contactless Affective States Through Physiological Signals Database (CAST-Phys), a novel high-quality dataset capable of remote physiological emotion recognition.<n>Our analysis highlights the crucial role of physiological signals in realistic scenarios where facial expressions alone may not provide sufficient emotional information.
arXiv Detail & Related papers (2025-07-08T15:20:24Z)
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing [49.243031514520794]
Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
arXiv Detail & Related papers (2025-05-06T15:18:38Z)
Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation [46.36100528165335]
Photoplethysmography and electrocardiography can potentially enable continuous blood pressure (BP) monitoring.<n>Yet accurate and robust machine learning (ML) models remains challenging due to variability in data quality and patient-specific factors.<n>In this work, we investigate whether a model pre-trained on one modality can effectively be exploited to improve the accuracy of a different signal type.<n>Our approach achieves near state-of-the-art accuracy for diastolic BP and surpasses by 1.5x the accuracy of prior works for systolic BP.
arXiv Detail & Related papers (2025-02-10T13:33:12Z)
Continuous Wavelet Transformation and VGG16 Deep Neural Network for Stress Classification in PPG Signals [0.22499166814992436]
Our research introduces a groundbreaking approach to stress classification through Photoplethysmogram signals. By incorporating Continuous Wavelet Transformation (CWT) with the proven VGG16, our method enhances stress assessment accuracy and reliability.
arXiv Detail & Related papers (2024-10-17T19:29:52Z)
Investigating the Generalizability of Physiological Characteristics of Anxiety [3.4036712573981607]
We evaluate the generalizability of physiological features that have been shown to be correlated with anxiety and stress to high-arousal emotions. This work is the first cross-corpus evaluation across stress and arousal from ECG and EDA signals, contributing new findings about the generalizability of stress detection.
arXiv Detail & Related papers (2024-01-23T16:49:54Z)
Deep-seeded Clustering for Emotion Recognition from Wearable Physiological Sensors [1.380698851850167]
We propose and test a deep-seeded clustering algorithm that automatically extracts and classifies features from physiological signals with minimal supervision.<n>We show that the model obtains good performance results across three different datasets frequently used in affective computing studies.
arXiv Detail & Related papers (2023-08-17T14:37:35Z)
Semantic-aware One-shot Face Re-enactment with Dense Correspondence Estimation [100.60938767993088]
One-shot face re-enactment is a challenging task due to the identity mismatch between source and driving faces. This paper proposes to use 3D Morphable Model (3DMM) for explicit facial semantic decomposition and identity disentanglement.
arXiv Detail & Related papers (2022-11-23T03:02:34Z)
The Face of Affective Disorders [7.4005714204825646]
We study the statistical properties of facial behaviour altered by the regulation of brain arousal in the clinical domain of psychiatry. We name the presented measurement in the sense of the classical scalp based obtrusive sensors Opto Electronic Encephalography (OEG) which relies solely on modern camera based real-time signal processing and computer vision.
arXiv Detail & Related papers (2022-08-02T11:28:17Z)
Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition [49.42127182149948]
Recent studies have revealed the vulnerability of face recognition models against physical adversarial patches. We propose to simulate the complex transformations of faces in the physical world via 3D-face modeling. We further propose a Face3DAdv method considering the 3D face transformations and realistic physical variations.
arXiv Detail & Related papers (2022-03-09T10:21:40Z)
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection [112.96004727646115]
We develop a method to detect face-manipulated videos using real talking faces. We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments. Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.
arXiv Detail & Related papers (2022-01-18T17:14:54Z)
Robust and Precise Facial Landmark Detection by Self-Calibrated Pose Attention Network [73.56802915291917]
We propose a semi-supervised framework to achieve more robust and precise facial landmark detection. A Boundary-Aware Landmark Intensity (BALI) field is proposed to model more effective facial shape constraints. A Self-Calibrated Pose Attention (SCPA) model is designed to provide a self-learned objective function that enforces intermediate supervision.
arXiv Detail & Related papers (2021-12-23T02:51:08Z)
StressNet: Detecting Stress in Thermal Videos [10.453959171422147]
This paper presents a novel approach to obtaining physiological signals and classifying stress states from thermal video. "StressNet" reconstructs the ISTI ( Initial Systolic Time Interval: a measure of change in cardiac sympathetic activity that is considered to be a quantitative index of stress humans. A detailed evaluation demonstrates that StressNet estimated the ISTI signal with 95% accuracy and detect stress with average precision of 0.842.
arXiv Detail & Related papers (2020-11-18T20:47:23Z)
Unsupervised Learning Facial Parameter Regressor for Action Unit Intensity Estimation via Differentiable Renderer [51.926868759681014]
We present a framework to predict the facial parameters based on a bone-driven face model (BDFM) under different views. The proposed framework consists of a feature extractor, a generator, and a facial parameter regressor.
arXiv Detail & Related papers (2020-08-20T09:49:13Z)
Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations. We then use the distilled physiological features for robust multi-task physiological measurements. The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)
Detecting Parkinsonian Tremor from IMU Data Collected In-The-Wild using Deep Multiple-Instance Learning [59.74684475991192]
Parkinson's Disease (PD) is a slowly evolving neuro-logical disease that affects about 1% of the population above 60 years old. PD symptoms include tremor, rigidity and braykinesia. We present a method for automatically identifying tremorous episodes related to PD, based on IMU signals captured via a smartphone device.
arXiv Detail & Related papers (2020-05-06T09:02:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.