egoEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-World Tasks
- URL: http://arxiv.org/abs/2510.22129v1
- Date: Sat, 25 Oct 2025 03:04:51 GMT
- Title: egoEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-World Tasks
- Authors: Matthias Jammot, Bjöern Braun, Paul Streli, Rafael Wampfler, Christian Holz,
- Abstract summary: egoEMOTION is the first dataset that couples egocentric visual and physiological signals with dense self-reports of emotion and personality.<n>Our dataset includes over 50 hours of recordings from 43 participants, captured using Meta's Project Aria glasses.
- Score: 26.06615078274544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding affect is central to anticipating human behavior, yet current egocentric vision benchmarks largely ignore the person's emotional states that shape their decisions and actions. Existing tasks in egocentric perception focus on physical activities, hand-object interactions, and attention modeling - assuming neutral affect and uniform personality. This limits the ability of vision systems to capture key internal drivers of behavior. In this paper, we present egoEMOTION, the first dataset that couples egocentric visual and physiological signals with dense self-reports of emotion and personality across controlled and real-world scenarios. Our dataset includes over 50 hours of recordings from 43 participants, captured using Meta's Project Aria glasses. Each session provides synchronized eye-tracking video, headmounted photoplethysmography, inertial motion data, and physiological baselines for reference. Participants completed emotion-elicitation tasks and naturalistic activities while self-reporting their affective state using the Circumplex Model and Mikels' Wheel as well as their personality via the Big Five model. We define three benchmark tasks: (1) continuous affect classification (valence, arousal, dominance); (2) discrete emotion classification; and (3) trait-level personality inference. We show that a classical learning-based method, as a simple baseline in real-world affect prediction, produces better estimates from signals captured on egocentric vision systems than processing physiological signals. Our dataset establishes emotion and personality as core dimensions in egocentric perception and opens new directions in affect-driven modeling of behavior, intent, and interaction.
Related papers
- Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention [58.05340906967343]
Egocentric Referring Video Object (Ego-RVOS) aims to segment the specific object actively involved in a human action, as described by a language query, within first-person videos.<n>Existing methods often struggle, learning spurious correlations from skewed object-action pairings in datasets.<n>We introduce Causal-REferring (CERES), a plug-in causal framework that adapts strong, pre-trained RVOSs to the egocentric domain.
arXiv Detail & Related papers (2025-12-30T16:22:14Z) - Modelling the Interplay of Eye-Tracking Temporal Dynamics and Personality for Emotion Detection in Face-to-Face Settings [1.2600839346487007]
This work presents a personality-aware multimodal framework that integrates eye-tracking sequences, Big Five personality traits, and contextual stimulus cues to predict both perceived and felt emotions.<n>Results show that stimulus cues strongly enhance perceived-emotion predictions, while personality traits provide the largest improvements for felt emotion recognition.
arXiv Detail & Related papers (2025-09-19T16:05:23Z) - Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning [69.71072181304066]
We introduce Perceptive Dexterous Control (PDC), a framework for vision-driven whole-body control with simulated humanoids.<n>PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues.<n>We show that training from scratch with reinforcement learning can produce emergent behaviors such as active search.
arXiv Detail & Related papers (2025-05-18T07:33:31Z) - Modelling Emotions in Face-to-Face Setting: The Interplay of Eye-Tracking, Personality, and Temporal Dynamics [1.4645774851707578]
In this study, we showcase how integrating eye-tracking data, temporal dynamics, and personality traits can substantially enhance the detection of both perceived and felt emotions.<n>Our findings inform the design of future affective computing and human-agent systems.
arXiv Detail & Related papers (2025-03-18T13:15:32Z) - egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks [19.969886981165754]
Egocentric vision systems aim to understand the surroundings and the wearer's behavior inside it, including motions, activities, and interactions.<n>We argue that egocentric systems must additionally detect physiological states to capture a person's attention and situational responses.<n>We introduce PulseFormer, a method to extract heart rate as a key indicator of physiological state from the eye tracking cameras on egocentric vision systems.
arXiv Detail & Related papers (2025-02-28T09:23:40Z) - PersonalityScanner: Exploring the Validity of Personality Assessment Based on Multimodal Signals in Virtual Reality [44.15145632980038]
PersonalityScanner is a VR simulator to stimulate cognitive processes and simulate daily behaviors.
We collect a synchronous multi-modal dataset with ten modalities, including first/third-person video, audio, text, eye tracking, facial microexpression, pose, depth data, log, and inertial measurement unit.
arXiv Detail & Related papers (2024-07-29T06:17:41Z) - EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views [51.53089073920215]
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception.
Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view.
We present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance.
arXiv Detail & Related papers (2024-05-22T14:03:48Z) - EgoGen: An Egocentric Synthetic Data Generator [53.32942235801499]
EgoGen is a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.
At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment.
We demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views.
arXiv Detail & Related papers (2024-01-16T18:55:22Z) - MECCANO: A Multimodal Egocentric Dataset for Humans Behavior
Understanding in the Industrial-like Domain [23.598727613908853]
We present MECCANO, a dataset of egocentric videos to study humans behavior understanding in industrial-like settings.
The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset.
The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view.
arXiv Detail & Related papers (2022-09-19T00:52:42Z) - The world seems different in a social context: a neural network analysis
of human experimental data [57.729312306803955]
We show that it is possible to replicate human behavioral data in both individual and social task settings by modifying the precision of prior and sensory signals.
An analysis of the neural activation traces of the trained networks provides evidence that information is coded in fundamentally different ways in the network in the individual and in the social conditions.
arXiv Detail & Related papers (2022-03-03T17:19:12Z) - Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.