Related papers: egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks

egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks

URL: http://arxiv.org/abs/2502.20879v2
Date: Wed, 06 Aug 2025 07:45:37 GMT
Title: egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks
Authors: Björn Braun, Rayan Armani, Manuel Meier, Max Moebus, Christian Holz,
Abstract summary: Egocentric vision systems aim to understand the surroundings and the wearer's behavior inside it, including motions, activities, and interactions.<n>We argue that egocentric systems must additionally detect physiological states to capture a person's attention and situational responses.<n>We introduce PulseFormer, a method to extract heart rate as a key indicator of physiological state from the eye tracking cameras on egocentric vision systems.
Score: 19.969886981165754
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Egocentric vision systems aim to understand the spatial surroundings and the wearer's behavior inside it, including motions, activities, and interactions. We argue that egocentric systems must additionally detect physiological states to capture a person's attention and situational responses, which are critical for context-aware behavior modeling. In this paper, we propose egoPPG, a novel vision task for egocentric systems to recover a person's cardiac activity to aid downstream vision tasks. We introduce PulseFormer, a method to extract heart rate as a key indicator of physiological state from the eye tracking cameras on unmodified egocentric vision systems. PulseFormer continuously estimates the photoplethysmogram (PPG) from areas around the eyes and fuses motion cues from the headset's inertial measurement unit to track HR values. We demonstrate egoPPG's downstream benefit for a key task on EgoExo4D, an existing egocentric dataset for which we find PulseFormer's estimates of HR to improve proficiency estimation by 14%. To train and validate PulseFormer, we collected a dataset of 13+ hours of eye tracking videos from Project Aria and contact-based PPG signals as well as an electrocardiogram (ECG) for ground-truth HR values. Similar to EgoExo4D, 25 participants performed diverse everyday activities such as office work, cooking, dancing, and exercising, which induced significant natural motion and HR variation (44-164 bpm). Our model robustly estimates HR (MAE=7.67 bpm) and captures patterns (r=0.85). Our results show how egocentric systems may unify environmental and physiological tracking to better understand users and that egoPPG as a complementary task provides meaningful augmentations for existing datasets and tasks. We release our code, dataset, and HR augmentations for EgoExo4D to inspire research on physiology-aware egocentric tasks.

Related papers

Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention [58.05340906967343]
Egocentric Referring Video Object (Ego-RVOS) aims to segment the specific object actively involved in a human action, as described by a language query, within first-person videos.<n>Existing methods often struggle, learning spurious correlations from skewed object-action pairings in datasets.<n>We introduce Causal-REferring (CERES), a plug-in causal framework that adapts strong, pre-trained RVOSs to the egocentric domain.
arXiv Detail & Related papers (2025-12-30T16:22:14Z)
Eyes on Target: Gaze-Aware Object Detection in Egocentric Video [1.3320917259299652]
We propose Eyes on Target, a novel depth-aware and gaze-guided object detection framework for egocentric videos.<n>Our approach injects gaze-derived features into the attention mechanism of a Vision Transformer (ViT), effectively biasing spatial feature selection toward human-attended regions.<n>We validate our method on an egocentric simulator dataset where human visual attention is critical for task assessment.
arXiv Detail & Related papers (2025-11-03T05:21:58Z)
egoEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-World Tasks [26.06615078274544]
egoEMOTION is the first dataset that couples egocentric visual and physiological signals with dense self-reports of emotion and personality.<n>Our dataset includes over 50 hours of recordings from 43 participants, captured using Meta's Project Aria glasses.
arXiv Detail & Related papers (2025-10-25T03:04:51Z)
ECHO: Ego-Centric modeling of Human-Object interactions [71.17118015822699]
ECHO (Ego-Centric modeling of Human-Object interactions) is developed.<n>It recovers three modalities: human pose, object motion, and contact from such minimal observation.<n>It outperforms existing methods that do not offer the same flexibility.
arXiv Detail & Related papers (2025-08-29T12:12:22Z)
Egocentric Human-Object Interaction Detection: A New Benchmark and Method [15.271558280695631]
Egocentric human-object interaction (Ego-HOI) detection is crucial for intelligent agents to understand and assist human activities from a first-person perspective.<n>We introduce the real-world Ego-HOI detection task and Ego-HOIBench, a new dataset with over 27K egocentric images and explicit, fine-grained hand-verb-object triplet annotations.<n>We propose Hand Geometry and Interactivity Refinement (HGIR), a lightweight, plug-and-play scheme that leverages hand pose and geometric cues to enhance interaction representations.
arXiv Detail & Related papers (2025-06-17T05:03:42Z)
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding [43.66860935790616]
EgoExOR is the first operating room (OR) dataset to fuse first-person and third-person perspectives.<n>It integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery.<n>We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models.
arXiv Detail & Related papers (2025-05-30T07:02:00Z)
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance [79.66329903007869]
We present EchoWorld, a motion-aware world modeling framework for probe guidance.<n>It encodes anatomical knowledge and motion-induced visual dynamics.<n>It is trained on more than one million ultrasound images from over 200 routine scans.
arXiv Detail & Related papers (2025-04-17T16:19:05Z)
EgoLife: Towards Egocentric Life Assistant [60.51196061794498]
We introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses. We conduct a comprehensive data collection study where six participants lived together for one week, continuously recording their daily activities using AI glasses for multimodal egocentric video capture, along with synchronized third-person-view video references. This effort resulted in the EgoLife dataset, a comprehensive 300-hour egocentric, interpersonal, multiview, and multimodal daily life dataset with intensive annotation. We introduce EgoLifeQA, a suite of long-context, life-oriented question-answering tasks designed to provide
arXiv Detail & Related papers (2025-03-05T18:54:16Z)
Estimating Body and Hand Motion in an Ego-sensed World [62.61989004520802]
We present EgoAllo, a system for human motion estimation from a head-mounted device.<n>Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters.
arXiv Detail & Related papers (2024-10-04T17:59:57Z)
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views [51.53089073920215]
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception. Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view. We present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance.
arXiv Detail & Related papers (2024-05-22T14:03:48Z)
Chaos in Motion: Unveiling Robustness in Remote Heart Rate Measurement through Brain-Inspired Skin Tracking [7.688280190165613]
Existing remote heart rate measurement methods have three serious problems. We apply chaos theory to computer vision tasks for the first time, thus a brain-inspired framework. Our method achieves Robust Skin Tracking for Heart Rate measurement, called HR-RST.
arXiv Detail & Related papers (2024-04-11T12:26:10Z)
EgoGen: An Egocentric Synthetic Data Generator [53.32942235801499]
EgoGen is a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks. At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment. We demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views.
arXiv Detail & Related papers (2024-01-16T18:55:22Z)
Ego-Body Pose Estimation via Ego-Head Pose Estimation [22.08240141115053]
Estimating 3D human motion from an egocentric video sequence plays a critical role in human behavior understanding and has various applications in VR/AR. We propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation. This disentanglement of head and body pose eliminates the need for training datasets with paired egocentric videos and 3D human motion.
arXiv Detail & Related papers (2022-12-09T02:25:20Z)
BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity. Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z)
Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data [72.1187887376849]
The selective attention mechanism helps the cognition system focus on task-relevant visual clues by ignoring the presence of distractors. We propose a framework to leverage gaze for medical image analysis tasks with small training data. Our method is demonstrated to achieve superior performance on both 3D tumor segmentation and 2D chest X-ray classification tasks.
arXiv Detail & Related papers (2021-12-02T07:55:25Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Self-supervised transfer learning of physiological representations from free-living wearable data [12.863826659440026]
We present a novel self-supervised representation learning method using activity and heart rate (HR) signals without semantic labels. We evaluate our model in the largest free-living combined-sensing dataset (comprising >280k hours of wrist accelerometer & wearable ECG data)
arXiv Detail & Related papers (2020-11-18T23:21:34Z)
Learning Generalizable Physiological Representations from Large-scale Wearable Data [12.863826659440026]
We present a novel self-supervised representation learning method using activity and heart rate (HR) signals without semantic labels. We show that the resulting embeddings can generalize in various downstream tasks through transfer learning with linear classifiers. Overall, we propose the first multimodal self-supervised method for behavioral and physiological data with implications for large-scale health and lifestyle monitoring.
arXiv Detail & Related papers (2020-11-09T17:56:03Z)
Sensorimotor Visual Perception on Embodied System Using Free Energy Principle [0.0]
We propose an embodied system based on the free energy principle (FEP) for sensorimotor visual perception. We evaluate it in a character-recognition task using the MNIST dataset.
arXiv Detail & Related papers (2020-06-11T05:03:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.