TransPPG: Two-stream Transformer for Remote Heart Rate Estimate
- URL: http://arxiv.org/abs/2201.10873v1
- Date: Wed, 26 Jan 2022 11:11:14 GMT
- Title: TransPPG: Two-stream Transformer for Remote Heart Rate Estimate
- Authors: Jiaqi Kang, Su Yang, Weishan Zhang
- Abstract summary: Non-contact facial video-based heart rate estimation using remote photoplethysthy (r) has shown great potential in many applications.
However, practical applications require results to be accurate even under complex environment with head movement and unstable illumination.
We propose a novel video embedding method that embeds each facial video sequence into a feature map referred to as Multi-scale Adaptive Spatial and Temporal Map with Overlap.
- Score: 4.866431869728018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-contact facial video-based heart rate estimation using remote
photoplethysmography (rPPG) has shown great potential in many applications
(e.g., remote health care) and achieved creditable results in constrained
scenarios. However, practical applications require results to be accurate even
under complex environment with head movement and unstable illumination.
Therefore, improving the performance of rPPG in complex environment has become
a key challenge. In this paper, we propose a novel video embedding method that
embeds each facial video sequence into a feature map referred to as Multi-scale
Adaptive Spatial and Temporal Map with Overlap (MAST_Mop), which contains not
only vital information but also surrounding information as reference, which
acts as the mirror to figure out the homogeneous perturbations imposed on
foreground and background simultaneously, such as illumination instability.
Correspondingly, we propose a two-stream Transformer model to map the MAST_Mop
into heart rate (HR), where one stream follows the pulse signal in the facial
area while the other figures out the perturbation signal from the surrounding
region such that the difference of the two channels leads to adaptive noise
cancellation. Our approach significantly outperforms all current
state-of-the-art methods on two public datasets MAHNOB-HCI and VIPL-HR. As far
as we know, it is the first work with Transformer as backbone to capture the
temporal dependencies in rPPGs and apply the two stream scheme to figure out
the interference from backgrounds as mirror of the corresponding perturbation
on foreground signals for noise tolerating.
Related papers
- Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution [7.252121550658619]
Denoising Diffusion Probabilistic Model (DDPM) has shown promising performance in image reconstructions.
High-frequency details generated by DDPM often suffer from misalignment with HR images due to model's tendency to overlook long-range semantic contexts.
An adaptive semantic-enhanced DDPM (ASDDPM) is proposed to enhance the detail-preserving capability of the DDPM.
arXiv Detail & Related papers (2024-03-17T04:08:58Z) - Dual-path TokenLearner for Remote Photoplethysmography-based
Physiological Measurement with Facial Videos [24.785755814666086]
This paper utilizes the concept of learnable tokens to integrate both spatial and temporal informative contexts from the global perspective of the video.
A Temporal TokenLearner (TTL) is designed to infer the quasi-periodic pattern of heartbeats, which eliminates temporal disturbances such as head movements.
arXiv Detail & Related papers (2023-08-15T13:45:45Z) - Learning Feature Recovery Transformer for Occluded Person
Re-identification [71.18476220969647]
We propose a new approach called Feature Recovery Transformer (FRT) to address the two challenges simultaneously.
To reduce the interference of the noise during feature matching, we mainly focus on visible regions that appear in both images and develop a visibility graph to calculate the similarity.
In terms of the second challenge, based on the developed graph similarity, for each query image, we propose a recovery transformer that exploits the feature sets of its $k$-nearest neighbors in the gallery to recover the complete features.
arXiv Detail & Related papers (2023-01-05T02:36:16Z) - Learning Motion-Robust Remote Photoplethysmography through Arbitrary
Resolution Videos [31.512551653273373]
In the real-world long-term health monitoring scenario, the distance of participants and their head movements usually vary by time, resulting in the inaccurate r measurement.
Different from the previous r models designed for a constant distance between camera and participants, in this paper, we propose two plug-and-play blocks (i.e., physiological signal feature extraction block (PFE) and temporal face alignment block (TFA)) to alleviate the degradation of changing distance and head motion.
arXiv Detail & Related papers (2022-11-30T11:50:08Z) - Blur Interpolation Transformer for Real-World Motion from Blur [52.10523711510876]
We propose a encoded blur transformer (BiT) to unravel the underlying temporal correlation in blur.
Based on multi-scale residual Swin transformer blocks, we introduce dual-end temporal supervision and temporally symmetric ensembling strategies.
In addition, we design a hybrid camera system to collect the first real-world dataset of one-to-many blur-sharp video pairs.
arXiv Detail & Related papers (2022-11-21T13:10:10Z) - Deep Reinforcement Learning for IRS Phase Shift Design in
Spatiotemporally Correlated Environments [93.30657979626858]
We propose a deep actor-critic algorithm that accounts for channel correlations and destination motion.
We show that, when channels aretemporally correlated, the inclusion of the SNR in the state representation with function approximation in ways that inhibit convergence.
arXiv Detail & Related papers (2022-11-02T22:07:36Z) - Facial Video-based Remote Physiological Measurement via Self-supervised
Learning [9.99375728024877]
We introduce a novel framework that learns to estimate r signals from facial videos without the need of ground truth signals.
Negative samples are generated via a learnable frequency module, which performs nonlinear signal frequency transformation.
Next, we introduce a local r expert aggregation module to estimate r signals from augmented samples.
It encodes complementary pulsation information from different face regions and aggregate them into one r prediction.
arXiv Detail & Related papers (2022-10-27T13:03:23Z) - DRNet: Decomposition and Reconstruction Network for Remote Physiological
Measurement [39.73408626273354]
Existing methods are generally divided into two groups.
The first focuses on mining the subtle volume pulse (BVP) signals from face videos, but seldom explicitly models the noises that dominate face video content.
The second focuses on modeling noisy data directly, resulting in suboptimal performance due to the lack of regularity of these severe random noises.
arXiv Detail & Related papers (2022-06-12T07:40:10Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Augmented Transformer with Adaptive Graph for Temporal Action Proposal
Generation [79.98992138865042]
We present an augmented transformer with adaptive graph network (ATAG) to exploit both long-range and local temporal contexts for TAPG.
Specifically, we enhance the vanilla transformer by equipping a snippet actionness loss and a front block, dubbed augmented transformer.
An adaptive graph convolutional network (GCN) is proposed to build local temporal context by mining the position information and difference between adjacent features.
arXiv Detail & Related papers (2021-03-30T02:01:03Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.