Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote
Physiological Measurement via Spatiotemporal Contrast
- URL: http://arxiv.org/abs/2309.06924v3
- Date: Sun, 18 Feb 2024 14:04:48 GMT
- Title: Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote
Physiological Measurement via Spatiotemporal Contrast
- Authors: Zhaodong Sun and Xiaobai Li
- Abstract summary: We propose Contrast-Phys+, a method that can be trained in both unsupervised and unsupervised settings.
We employ a 3DCNN model to generate multiple rtemporal signals and incorporate prior knowledge of r into a contrastive loss function.
Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals.
- Score: 22.742875409103164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video-based remote physiological measurement utilizes facial videos to
measure the blood volume change signal, which is also called remote
photoplethysmography (rPPG). Supervised methods for rPPG measurements have been
shown to achieve good performance. However, the drawback of these methods is
that they require facial videos with ground truth (GT) physiological signals,
which are often costly and difficult to obtain. In this paper, we propose
Contrast-Phys+, a method that can be trained in both unsupervised and
weakly-supervised settings. We employ a 3DCNN model to generate multiple
spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a
contrastive loss function. We further incorporate the GT signals into
contrastive learning to adapt to partial or misaligned labels. The contrastive
loss encourages rPPG/GT signals from the same video to be grouped together,
while pushing those from different videos apart. We evaluate our methods on
five publicly available datasets that include both RGB and Near-infrared
videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods,
even when using partially available or misaligned GT signals, or no labels at
all. Additionally, we highlight the advantages of our methods in terms of
computational efficiency, noise robustness, and generalization. Our code is
available at https://github.com/zhaodongsun/contrast-phys.
Related papers
- SiNC+: Adaptive Camera-Based Vitals with Unsupervised Learning of Periodic Signals [6.458510829614774]
We present the first non-contrastive unsupervised learning framework for signal regression.
We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning periodic signals.
arXiv Detail & Related papers (2024-04-20T19:17:40Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - Non-Contact NIR PPG Sensing through Large Sequence Signal Regression [0.0]
Non-Contact sensing is an emerging technology with applications across many industries from driver monitoring in vehicles to patient monitoring in healthcare.
Current state-of-the-art focus on RGB video, but this struggles in varying/noisy light conditions and is almost completely unfeasible in the dark. Near Infra-Red (NIR) video, however, does not suffer from these constraints.
This paper aims to demonstrate the effectiveness of an alternative Convolution Attention Network (CAN) architecture, to regress photoplethysmography (NIR) signal from a sequence of NIR frames.
arXiv Detail & Related papers (2023-11-20T13:34:51Z) - Non-Contrastive Unsupervised Learning of Physiological Signals from
Video [4.8327232174895745]
We present the first non-contrastive unsupervised learning framework for signal regression to break free from labelled video data.
With minimal assumptions of periodicity and finite bandwidth, our approach is capable of discovering blood volume pulse directly from unlabelled videos.
arXiv Detail & Related papers (2023-03-14T14:34:51Z) - Improving Unsupervised Video Object Segmentation with Motion-Appearance
Synergy [52.03068246508119]
We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference.
IMAS achieves Improved UVOS with Motion-Appearance Synergy.
We demonstrate its effectiveness in tuning critical hyperparams previously tuned with human annotation or hand-crafted hyperparam-specific metrics.
arXiv Detail & Related papers (2022-12-17T06:47:30Z) - Facial Video-based Remote Physiological Measurement via Self-supervised
Learning [9.99375728024877]
We introduce a novel framework that learns to estimate r signals from facial videos without the need of ground truth signals.
Negative samples are generated via a learnable frequency module, which performs nonlinear signal frequency transformation.
Next, we introduce a local r expert aggregation module to estimate r signals from augmented samples.
It encodes complementary pulsation information from different face regions and aggregate them into one r prediction.
arXiv Detail & Related papers (2022-10-27T13:03:23Z) - Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement
via Spatiotemporal Contrast [17.691683039742323]
Video-based remote physiological measurement face videos to measure the blood volume change signal, which is also called remote photoplethysmography (r)
We use a 3DCNN model to generate multiple rtemporal signals from each video in different locations and train the model with a contrastive loss where r signals from the same video are pulled together while those from different videos are pushed away.
arXiv Detail & Related papers (2022-08-08T19:30:57Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Perceptual Learned Video Compression with Recurrent Conditional GAN [158.0726042755]
We propose a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network.
PLVC learns to compress video towards good perceptual quality at low bit-rate.
The user study further validates the outstanding perceptual performance of PLVC in comparison with the latest learned video compression approaches.
arXiv Detail & Related papers (2021-09-07T13:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.