FacePhys: State of the Heart Learning
- URL: http://arxiv.org/abs/2512.06275v1
- Date: Sat, 06 Dec 2025 03:54:12 GMT
- Title: FacePhys: State of the Heart Learning
- Authors: Kegang Wang, Jiankai Tang, Yuntao Wang, Xin Liu, Yuxuan Fan, Jiatong Ji, Yuanchun Shi, Daniel McDuff,
- Abstract summary: FacePhys captures subtle periodic variations across video frames while maintaining a minimal computational overhead.<n>Our solution enables real-time inference with a memory footprint of 3.6 MB and per-frame latency of 9.46 ms.<n>These results translate into reliable real-time performance in practical deployments.
- Score: 29.799216245524466
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vital sign measurement using cameras presents opportunities for comfortable, ubiquitous health monitoring. Remote photoplethysmography (rPPG), a foundational technology, enables cardiac measurement through minute changes in light reflected from the skin. However, practical deployment is limited by the computational constraints of performing analysis on front-end devices and the accuracy degradation of transmitting data through compressive channels that reduce signal quality. We propose a memory efficient rPPG algorithm - \emph{FacePhys} - built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time operation. Leveraging a transferable heart state, FacePhys captures subtle periodic variations across video frames while maintaining a minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. FacePhys establishes a new state-of-the-art, with a substantial 49\% reduction in error. Our solution enables real-time inference with a memory footprint of 3.6 MB and per-frame latency of 9.46 ms -- surpassing existing methods by 83\% to 99\%. These results translate into reliable real-time performance in practical deployments, and a live demo is available at https://www.facephys.com/.
Related papers
- Locally Adaptive Decay Surfaces for High-Speed Face and Landmark Detection with Event Cameras [2.467339701756281]
Event cameras record luminance changes with microsecond resolution.<n> converting their sparse, asynchronous output into dense tensors that neural networks can exploit remains a core challenge.<n>We introduce Locally Adaptive Decay Surfaces (LADS), a family of event representations in which the temporal decay at each location is modulated according to local signal dynamics.
arXiv Detail & Related papers (2026-02-26T15:16:04Z) - Event-based Visual Deformation Measurement [76.25283405575108]
Visual Deformation Measurement aims to recover dense deformation fields by tracking surface motion from camera observations.<n>Traditional image-based methods rely on minimal inter-frame motion to constrain the correspondence search space.<n>We propose an event-frame fusion framework that exploits events for temporally dense motion cues and frames for spatially dense precise estimation.
arXiv Detail & Related papers (2026-02-16T01:04:48Z) - Editing Physiological Signals in Videos Using Latent Representations [1.1688456044134343]
Heart Rate (HR) is a non-contact means to monitor the health of an individual.<n>The presence of vital signals in facial videos raises significant privacy concerns.<n>We propose that edits physiological signals in videos while preserving visual fidelity.<n>Our design's controllable HR editing is useful for applications such as anonymizing biometric signals in real videos or realistic videos with vital signs.
arXiv Detail & Related papers (2025-09-29T18:02:50Z) - PHASE-Net: Physics-Grounded Harmonic Attention System for Efficient Remote Photoplethysmography Measurement [63.007237197267834]
Existing deep learning methods are mostly physiological monitoring and lack theoretical robustness.<n>We propose a physics-informed r paradigm derived from the Navier-Stokes equations of hemodynamics, showing that the pulse signal follows a second-order system.<n>This provides a theoretical justification for using a Temporal Conal Network (TCN)<n>Phase-Net achieves state-of-the-art performance with strong efficiency, offering a theoretically grounded and deployment-ready r solution.
arXiv Detail & Related papers (2025-09-29T14:36:45Z) - Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators [74.65171736966131]
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit.<n>Current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation.<n>We introduce Pano, an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions.
arXiv Detail & Related papers (2025-09-11T23:12:55Z) - Memory-efficient Low-latency Remote Photoplethysmography through Temporal-Spatial State Space Duality [15.714133129768323]
ME-r is a memory-efficient algorithm built on temporal-spatial state space duality.<n>It efficiently captures subtle periodic variations across facial frames while maintaining minimal computational overhead.<n>Our solution enables real-time inference with only 3.6 MB memory usage and 9.46 ms latency.
arXiv Detail & Related papers (2025-04-02T14:34:04Z) - FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks [21.076600109388394]
Depression is a prevalent mental health disorder that significantly impacts individuals' lives and well-being.
Recently, there are many end-to-end deep learning methods leveraging the facial expression features for automatic depression detection.
We propose a novel framework called FacialPulse, which recognizes depression with high accuracy and speed.
arXiv Detail & Related papers (2024-08-07T01:50:34Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - STIP: A SpatioTemporal Information-Preserving and Perception-Augmented
Model for High-Resolution Video Prediction [78.129039340528]
We propose a Stemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems.
The proposed model aims to preserve thetemporal information for videos during the feature extraction and the state transitions.
Experimental results show that the proposed STIP can predict videos with more satisfactory visual quality compared with a variety of state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T09:49:04Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.