Dual-path TokenLearner for Remote Photoplethysmography-based
Physiological Measurement with Facial Videos
- URL: http://arxiv.org/abs/2308.07771v1
- Date: Tue, 15 Aug 2023 13:45:45 GMT
- Title: Dual-path TokenLearner for Remote Photoplethysmography-based
Physiological Measurement with Facial Videos
- Authors: Wei Qian, Dan Guo, Kun Li, Xilan Tian, Meng Wang
- Abstract summary: This paper utilizes the concept of learnable tokens to integrate both spatial and temporal informative contexts from the global perspective of the video.
A Temporal TokenLearner (TTL) is designed to infer the quasi-periodic pattern of heartbeats, which eliminates temporal disturbances such as head movements.
- Score: 24.785755814666086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote photoplethysmography (rPPG) based physiological measurement is an
emerging yet crucial vision task, whose challenge lies in exploring accurate
rPPG prediction from facial videos accompanied by noises of illumination
variations, facial occlusions, head movements, \etc, in a non-contact manner.
Existing mainstream CNN-based models make efforts to detect physiological
signals by capturing subtle color changes in facial regions of interest (ROI)
caused by heartbeats. However, such models are constrained by the limited local
spatial or temporal receptive fields in the neural units. Unlike them, a native
Transformer-based framework called Dual-path TokenLearner (Dual-TL) is proposed
in this paper, which utilizes the concept of learnable tokens to integrate both
spatial and temporal informative contexts from the global perspective of the
video. Specifically, the proposed Dual-TL uses a Spatial TokenLearner (S-TL) to
explore associations in different facial ROIs, which promises the rPPG
prediction far away from noisy ROI disturbances. Complementarily, a Temporal
TokenLearner (T-TL) is designed to infer the quasi-periodic pattern of
heartbeats, which eliminates temporal disturbances such as head movements. The
two TokenLearners, S-TL and T-TL, are executed in a dual-path mode. This
enables the model to reduce noise disturbances for final rPPG signal
prediction. Extensive experiments on four physiological measurement benchmark
datasets are conducted. The Dual-TL achieves state-of-the-art performances in
both intra- and cross-dataset testings, demonstrating its immense potential as
a basic backbone for rPPG measurement. The source code is available at
\href{https://github.com/VUT-HFUT/Dual-TL}{https://github.com/VUT-HFUT/Dual-TL}
Related papers
- PhysMamba: State Space Duality Model for Remote Physiological Measurement [20.441281420017656]
Remote Photoplethysmography (rBFC) is used in applications like emotion monitoring, medical assistance, and anti-face spoofing.
Unlike controlled laboratory settings, real-world environments often contain motion artifacts and noise.
We propose PhysMamba, a dual-Path-frequency model via State Space Duality.
This method allows the network to learn richer, more representative features, enhancing robustness in noisy conditions.
arXiv Detail & Related papers (2024-08-02T07:52:28Z) - Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement [26.480515954528848]
We propose a novel framework that successfully integrates popular vision-language models into a remote physiological measurement task.
We develop a series of generative and contrastive learning mechanisms to optimize the framework.
Our method for the first time adapts VLMs to digest and align the frequency-related knowledge in vision and text modalities.
arXiv Detail & Related papers (2024-07-11T13:45:50Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - Learning Motion-Robust Remote Photoplethysmography through Arbitrary
Resolution Videos [31.512551653273373]
In the real-world long-term health monitoring scenario, the distance of participants and their head movements usually vary by time, resulting in the inaccurate r measurement.
Different from the previous r models designed for a constant distance between camera and participants, in this paper, we propose two plug-and-play blocks (i.e., physiological signal feature extraction block (PFE) and temporal face alignment block (TFA)) to alleviate the degradation of changing distance and head motion.
arXiv Detail & Related papers (2022-11-30T11:50:08Z) - Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy.
Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification.
We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z) - TransPPG: Two-stream Transformer for Remote Heart Rate Estimate [4.866431869728018]
Non-contact facial video-based heart rate estimation using remote photoplethysthy (r) has shown great potential in many applications.
However, practical applications require results to be accurate even under complex environment with head movement and unstable illumination.
We propose a novel video embedding method that embeds each facial video sequence into a feature map referred to as Multi-scale Adaptive Spatial and Temporal Map with Overlap.
arXiv Detail & Related papers (2022-01-26T11:11:14Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.