Facial Video-based Remote Physiological Measurement via Self-supervised
Learning
- URL: http://arxiv.org/abs/2210.15401v3
- Date: Sat, 22 Jul 2023 07:21:11 GMT
- Title: Facial Video-based Remote Physiological Measurement via Self-supervised
Learning
- Authors: Zijie Yue, Miaojing Shi, Shuai Ding
- Abstract summary: We introduce a novel framework that learns to estimate r signals from facial videos without the need of ground truth signals.
Negative samples are generated via a learnable frequency module, which performs nonlinear signal frequency transformation.
Next, we introduce a local r expert aggregation module to estimate r signals from augmented samples.
It encodes complementary pulsation information from different face regions and aggregate them into one r prediction.
- Score: 9.99375728024877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial video-based remote physiological measurement aims to estimate remote
photoplethysmography (rPPG) signals from human face videos and then measure
multiple vital signs (e.g. heart rate, respiration frequency) from rPPG
signals. Recent approaches achieve it by training deep neural networks, which
normally require abundant facial videos and synchronously recorded
photoplethysmography (PPG) signals for supervision. However, the collection of
these annotated corpora is not easy in practice. In this paper, we introduce a
novel frequency-inspired self-supervised framework that learns to estimate rPPG
signals from facial videos without the need of ground truth PPG signals. Given
a video sample, we first augment it into multiple positive/negative samples
which contain similar/dissimilar signal frequencies to the original one.
Specifically, positive samples are generated using spatial augmentation.
Negative samples are generated via a learnable frequency augmentation module,
which performs non-linear signal frequency transformation on the input without
excessively changing its visual appearance. Next, we introduce a local rPPG
expert aggregation module to estimate rPPG signals from augmented samples. It
encodes complementary pulsation information from different face regions and
aggregate them into one rPPG prediction. Finally, we propose a series of
frequency-inspired losses, i.e. frequency contrastive loss, frequency ratio
consistency loss, and cross-video frequency agreement loss, for the
optimization of estimated rPPG signals from multiple augmented video samples
and across temporally neighboring video samples. We conduct rPPG-based heart
rate, heart rate variability and respiration frequency estimation on four
standard benchmarks. The experimental results demonstrate that our method
improves the state of the art by a large margin.
Related papers
- Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning [49.275450836604726]
We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training.
We employ a two-branch framework empowered by knowledge distillation, enabling the model to take both the filtered and original images as input.
arXiv Detail & Related papers (2024-09-16T15:10:07Z) - Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement [26.480515954528848]
We propose a novel framework that successfully integrates popular vision-language models into a remote physiological measurement task.
We develop a series of generative and contrastive learning mechanisms to optimize the framework.
Our method for the first time adapts VLMs to digest and align the frequency-related knowledge in vision and text modalities.
arXiv Detail & Related papers (2024-07-11T13:45:50Z) - SiNC+: Adaptive Camera-Based Vitals with Unsupervised Learning of Periodic Signals [6.458510829614774]
We present the first non-contrastive unsupervised learning framework for signal regression.
We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning periodic signals.
arXiv Detail & Related papers (2024-04-20T19:17:40Z) - DopUS-Net: Quality-Aware Robotic Ultrasound Imaging based on Doppler
Signal [48.97719097435527]
DopUS-Net combines the Doppler images with B-mode images to increase the segmentation accuracy and robustness of small blood vessels.
An artery re-identification module qualitatively evaluate the real-time segmentation results and automatically optimize the probe pose for enhanced Doppler images.
arXiv Detail & Related papers (2023-05-15T18:19:29Z) - Non-Contrastive Unsupervised Learning of Physiological Signals from
Video [4.8327232174895745]
We present the first non-contrastive unsupervised learning framework for signal regression to break free from labelled video data.
With minimal assumptions of periodicity and finite bandwidth, our approach is capable of discovering blood volume pulse directly from unlabelled videos.
arXiv Detail & Related papers (2023-03-14T14:34:51Z) - Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement
via Spatiotemporal Contrast [17.691683039742323]
Video-based remote physiological measurement face videos to measure the blood volume change signal, which is also called remote photoplethysmography (r)
We use a 3DCNN model to generate multiple rtemporal signals from each video in different locations and train the model with a contrastive loss where r signals from the same video are pulled together while those from different videos are pushed away.
arXiv Detail & Related papers (2022-08-08T19:30:57Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - WPPG Net: A Non-contact Video Based Heart Rate Extraction Network
Framework with Compatible Training Capability [21.33542693986985]
Our facial skin presents subtle color change known as remote Photoplethys (r) signal, from which we could extract the heart rate of the subject.
Recently many deep learning methods and related datasets on r signal extraction are proposed.
However, because of the time consumption blood flowing through our body and other factors, label waves such as BVP signals have uncertain delays with real r signals in some datasets.
In this paper, by analyzing the common characteristics on rhythm and periodicity of r signals and label waves, we propose a whole set of training methodology which wraps these networks so that they could remain efficient when be trained at
arXiv Detail & Related papers (2022-07-04T19:52:30Z) - Identifying Rhythmic Patterns for Face Forgery Detection and
Categorization [46.21354355137544]
We propose a framework for face forgery detection and categorization consisting of: 1) a Spatial-Temporal Filtering Network (STFNet) for PPG signals, and 2) a Spatial-Temporal Interaction Network (STINet) for constraint and interaction of PPG signals.
With insight into the generation of forgery methods, we further propose intra-source and inter-source blending to boost the performance of the framework.
arXiv Detail & Related papers (2022-07-04T04:57:06Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Video-based Remote Physiological Measurement via Cross-verified Feature
Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations.
We then use the distilled physiological features for robust multi-task physiological measurements.
The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.