Learning Motion-Robust Remote Photoplethysmography through Arbitrary
Resolution Videos
- URL: http://arxiv.org/abs/2211.16922v2
- Date: Thu, 1 Dec 2022 03:01:44 GMT
- Title: Learning Motion-Robust Remote Photoplethysmography through Arbitrary
Resolution Videos
- Authors: Jianwei Li, Zitong Yu, Jingang Shi
- Abstract summary: In the real-world long-term health monitoring scenario, the distance of participants and their head movements usually vary by time, resulting in the inaccurate r measurement.
Different from the previous r models designed for a constant distance between camera and participants, in this paper, we propose two plug-and-play blocks (i.e., physiological signal feature extraction block (PFE) and temporal face alignment block (TFA)) to alleviate the degradation of changing distance and head motion.
- Score: 31.512551653273373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote photoplethysmography (rPPG) enables non-contact heart rate (HR)
estimation from facial videos which gives significant convenience compared with
traditional contact-based measurements. In the real-world long-term health
monitoring scenario, the distance of the participants and their head movements
usually vary by time, resulting in the inaccurate rPPG measurement due to the
varying face resolution and complex motion artifacts. Different from the
previous rPPG models designed for a constant distance between camera and
participants, in this paper, we propose two plug-and-play blocks (i.e.,
physiological signal feature extraction block (PFE) and temporal face alignment
block (TFA)) to alleviate the degradation of changing distance and head motion.
On one side, guided with representative-area information, PFE adaptively
encodes the arbitrary resolution facial frames to the fixed-resolution facial
structure features. On the other side, leveraging the estimated optical flow,
TFA is able to counteract the rPPG signal confusion caused by the head movement
thus benefit the motion-robust rPPG signal recovery. Besides, we also train the
model with a cross-resolution constraint using a two-stream dual-resolution
framework, which further helps PFE learn resolution-robust facial rPPG
features. Extensive experiments on three benchmark datasets (UBFC-rPPG, COHFACE
and PURE) demonstrate the superior performance of the proposed method. One
highlight is that with PFE and TFA, the off-the-shelf spatio-temporal rPPG
models can predict more robust rPPG signals under both varying face resolution
and severe head movement scenarios. The codes are available at
https://github.com/LJW-GIT/Arbitrary_Resolution_rPPG.
Related papers
- Dual-path TokenLearner for Remote Photoplethysmography-based
Physiological Measurement with Facial Videos [24.785755814666086]
This paper utilizes the concept of learnable tokens to integrate both spatial and temporal informative contexts from the global perspective of the video.
A Temporal TokenLearner (TTL) is designed to infer the quasi-periodic pattern of heartbeats, which eliminates temporal disturbances such as head movements.
arXiv Detail & Related papers (2023-08-15T13:45:45Z) - Mask Attack Detection Using Vascular-weighted Motion-robust rPPG Signals [21.884783786547782]
R-based face anti-spoofing methods often suffer from performance degradation due to unstable face alignment in the video sequence.
A landmark-anchored face stitching method is proposed to align the faces robustly and precisely at the pixel-wise level by using both SIFT keypoints and facial landmarks.
A lightweight EfficientNet with a Gated Recurrent Unit (GRU) is designed to extract both spatial and temporal features for classification.
arXiv Detail & Related papers (2023-05-25T11:22:17Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - Benchmarking Joint Face Spoofing and Forgery Detection with Visual and
Physiological Cues [81.15465149555864]
We establish the first joint face spoofing and detection benchmark using both visual appearance and physiological r cues.
To enhance the r periodicity discrimination, we design a two-branch physiological network using both facial powerful rtemporal signal map and its continuous wavelet transformed counterpart as inputs.
arXiv Detail & Related papers (2022-08-10T15:41:48Z) - Identifying Rhythmic Patterns for Face Forgery Detection and
Categorization [46.21354355137544]
We propose a framework for face forgery detection and categorization consisting of: 1) a Spatial-Temporal Filtering Network (STFNet) for PPG signals, and 2) a Spatial-Temporal Interaction Network (STINet) for constraint and interaction of PPG signals.
With insight into the generation of forgery methods, we further propose intra-source and inter-source blending to boost the performance of the framework.
arXiv Detail & Related papers (2022-07-04T04:57:06Z) - Face2PPG: An unsupervised pipeline for blood volume pulse extraction
from faces [0.456877715768796]
Photoplethys signals have become a key technology in many fields, such as medicine, well-being, or sports.
Our work proposes a set of pipelines to extract PPG signals from the face robustly, reliably, and robustness.
arXiv Detail & Related papers (2022-02-08T19:06:20Z) - TransPPG: Two-stream Transformer for Remote Heart Rate Estimate [4.866431869728018]
Non-contact facial video-based heart rate estimation using remote photoplethysthy (r) has shown great potential in many applications.
However, practical applications require results to be accurate even under complex environment with head movement and unstable illumination.
We propose a novel video embedding method that embeds each facial video sequence into a feature map referred to as Multi-scale Adaptive Spatial and Temporal Map with Overlap.
arXiv Detail & Related papers (2022-01-26T11:11:14Z) - Total Scale: Face-to-Body Detail Reconstruction from Sparse RGBD Sensors [52.38220261632204]
Flat facial surfaces frequently occur in the PIFu-based reconstruction results.
We propose a two-scale PIFu representation to enhance the quality of the reconstructed facial details.
Experiments demonstrate the effectiveness of our approach in vivid facial details and deforming body shapes.
arXiv Detail & Related papers (2021-12-03T18:46:49Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face
Presentation Attack Detection [53.98866801690342]
3D mask face presentation attack detection (PAD) plays a vital role in securing face recognition systems from 3D mask attacks.
We propose a pure r transformer (TransR) framework for learning live intrinsicness representation efficiently.
Our TransR is lightweight and efficient (with only 547K parameters and 763MOPs) which is promising for mobile-level applications.
arXiv Detail & Related papers (2021-04-15T12:33:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.