Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios
- URL: http://arxiv.org/abs/2503.11465v1
- Date: Fri, 14 Mar 2025 14:50:58 GMT
- Title: Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios
- Authors: Hang Shao, Lei Luo, Jianjun Qian, Mengkai Yan, Shuo Chen, Jian Yang,
- Abstract summary: We propose an end-to-endsupervised model for remote photoplemography.<n>It strives to eliminate complex and unknown external time-varying interferences.<n>This is the first robust r model for real outdoor natural face videos.
- Score: 26.913899198659436
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Physiological activities can be manifested by the sensitive changes in facial imaging. While they are barely observable to our eyes, computer vision manners can, and the derived remote photoplethysmography (rPPG) has shown considerable promise. However, existing studies mainly rely on spatial skin recognition and temporal rhythmic interactions, so they focus on identifying explicit features under ideal light conditions, but perform poorly in-the-wild with intricate obstacles and extreme illumination exposure. In this paper, we propose an end-to-end video transformer model for rPPG. It strives to eliminate complex and unknown external time-varying interferences, whether they are sufficient to occupy subtle biosignal amplitudes or exist as periodic perturbations that hinder network training. In the specific implementation, we utilize global interference sharing, subject background reference, and self-supervised disentanglement to eliminate interference, and further guide learning based on spatiotemporal filtering, reconstruction guidance, and frequency domain and biological prior constraints to achieve effective rPPG. To the best of our knowledge, this is the first robust rPPG model for real outdoor scenarios based on natural face videos, and is lightweight to deploy. Extensive experiments show the competitiveness and performance of our model in rPPG prediction across datasets and scenes.
Related papers
- CodePhys: Robust Video-based Remote Physiological Measurement through Latent Codebook Querying [26.97093819822487]
Remote photoplethysmography aims to measure non-contact physiological signals from facial videos.<n>Most existing methods directly extract video-based r features by designing neural networks for heart rate estimation.<n>Recent methods are easily affected by interference and degradation, resulting in noisy r signals.<n>We propose a novel method named CodePhys, which innovatively treats r measurement as a code task in a noise-free proxy space.
arXiv Detail & Related papers (2025-02-11T13:05:42Z) - Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors [52.195637608631955]
Non-line-of-sight (NLOS) imaging has attracted increasing attention due to its potential applications.
Existing NLOS reconstruction approaches are constrained by the reliance on empirical physical priors.
We introduce a novel learning-based solution, comprising two key designs: Learnable Path Compensation (LPC) and Adaptive Phasor Field (APF)
arXiv Detail & Related papers (2024-09-21T04:39:45Z) - Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement [26.480515954528848]
We propose a novel framework that successfully integrates popular vision-language models into a remote physiological measurement task.<n>We develop a series of generative and contrastive learning mechanisms to optimize the framework.<n>Our method for the first time adapts VLMs to digest and align the frequency-related knowledge in vision and text modalities.
arXiv Detail & Related papers (2024-07-11T13:45:50Z) - Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography [5.743550396843244]
MAR-r is a framework that integrates the impact of ROI localization and complex motion artifacts.
MAR-r employs a masked attention regularization mechanism into the r field to capture semantic consistency of facial clips.
It also employs a masking technique to prevent the model from overfitting on inaccurate ROIs and subsequently degrading its performance.
arXiv Detail & Related papers (2024-07-09T08:25:30Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - Practical Exposure Correction: Great Truths Are Always Simple [65.82019845544869]
We establish a Practical Exposure Corrector (PEC) that assembles the characteristics of efficiency and performance.
We introduce an exposure adversarial function as the key engine to fully extract valuable information from the observation.
Our experiments fully reveal the superiority of our proposed PEC.
arXiv Detail & Related papers (2022-12-29T09:52:13Z) - DRNet: Decomposition and Reconstruction Network for Remote Physiological
Measurement [39.73408626273354]
Existing methods are generally divided into two groups.
The first focuses on mining the subtle volume pulse (BVP) signals from face videos, but seldom explicitly models the noises that dominate face video content.
The second focuses on modeling noisy data directly, resulting in suboptimal performance due to the lack of regularity of these severe random noises.
arXiv Detail & Related papers (2022-06-12T07:40:10Z) - LTT-GAN: Looking Through Turbulence by Inverting GANs [86.25869403782957]
We propose the first turbulence mitigation method that makes use of visual priors encapsulated by a well-trained GAN.
Based on the visual priors, we propose to learn to preserve the identity of restored images on a periodic contextual distance.
Our method significantly outperforms prior art in both the visual quality and face verification accuracy of restored results.
arXiv Detail & Related papers (2021-12-04T16:42:13Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in
Frequency Domain [88.7339322596758]
We present a novel Spatial-Phase Shallow Learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery.
SPSL can achieve the state-of-the-art performance on cross-datasets evaluation as well as multi-class classification and obtain comparable results on single dataset evaluation.
arXiv Detail & Related papers (2021-03-02T16:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.