PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer
- URL: http://arxiv.org/abs/2302.03548v1
- Date: Tue, 7 Feb 2023 15:56:03 GMT
- Title: PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer
- Authors: Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Yawen Cui,
Jiehua Zhang, Philip Torr and Guoying Zhao
- Abstract summary: Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
- Score: 76.40106756572644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote photoplethysmography (rPPG), which aims at measuring heart activities
and physiological signals from facial video without any contact, has great
potential in many applications (e.g., remote healthcare and affective
computing). Recent deep learning approaches focus on mining subtle rPPG clues
using convolutional neural networks with limited spatio-temporal receptive
fields, which neglect the long-range spatio-temporal perception and interaction
for rPPG modeling. In this paper, we propose two end-to-end video transformer
based architectures, namely PhysFormer and PhysFormer++, to adaptively
aggregate both local and global spatio-temporal features for rPPG
representation enhancement. As key modules in PhysFormer, the temporal
difference transformers first enhance the quasi-periodic rPPG features with
temporal difference guided global attention, and then refine the local
spatio-temporal representation against interference. To better exploit the
temporal contextual and periodic rPPG clues, we also extend the PhysFormer to
the two-pathway SlowFast based PhysFormer++ with temporal difference periodic
and cross-attention transformers. Furthermore, we propose the label
distribution learning and a curriculum learning inspired dynamic constraint in
frequency domain, which provide elaborate supervisions for PhysFormer and
PhysFormer++ and alleviate overfitting. Comprehensive experiments are performed
on four benchmark datasets to show our superior performance on both intra- and
cross-dataset testings. Unlike most transformer networks needed pretraining
from large-scale datasets, the proposed PhysFormer family can be easily trained
from scratch on rPPG datasets, which makes it promising as a novel transformer
baseline for the rPPG community.
Related papers
- PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba [20.435381963248787]
Previous deep learning based r measurement are primarily based on CNNs and Transformers.
We propose PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos.
Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba.
arXiv Detail & Related papers (2024-09-18T14:48:50Z) - PhysMamba: State Space Duality Model for Remote Physiological Measurement [20.441281420017656]
Remote Photoplethysmography (rBFC) is used in applications like emotion monitoring, medical assistance, and anti-face spoofing.
Unlike controlled laboratory settings, real-world environments often contain motion artifacts and noise.
We propose PhysMamba, a dual-Path-frequency model via State Space Duality.
This method allows the network to learn richer, more representative features, enhancing robustness in noisy conditions.
arXiv Detail & Related papers (2024-08-02T07:52:28Z) - Dual-path TokenLearner for Remote Photoplethysmography-based
Physiological Measurement with Facial Videos [24.785755814666086]
This paper utilizes the concept of learnable tokens to integrate both spatial and temporal informative contexts from the global perspective of the video.
A Temporal TokenLearner (TTL) is designed to infer the quasi-periodic pattern of heartbeats, which eliminates temporal disturbances such as head movements.
arXiv Detail & Related papers (2023-08-15T13:45:45Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - ETLP: Event-based Three-factor Local Plasticity for online learning with
neuromorphic hardware [105.54048699217668]
We show a competitive performance in accuracy with a clear advantage in the computational complexity for Event-Based Three-factor Local Plasticity (ETLP)
We also show that when using local plasticity, threshold adaptation in spiking neurons and a recurrent topology are necessary to learntemporal patterns with a rich temporal structure.
arXiv Detail & Related papers (2023-01-19T19:45:42Z) - Unsupervised inter-frame motion correction for whole-body dynamic PET
using convolutional long short-term memory in a convolutional neural network [9.349668170221975]
We develop an unsupervised deep learning-based framework to correct inter-frame body motion.
The motion estimation network is a convolutional neural network with a combined convolutional long short-term memory layer.
Once trained, the motion estimation inference time of our proposed network was around 460 times faster than the conventional registration baseline.
arXiv Detail & Related papers (2022-06-13T17:38:16Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Adaptive Latent Space Tuning for Non-Stationary Distributions [62.997667081978825]
We present a method for adaptive tuning of the low-dimensional latent space of deep encoder-decoder style CNNs.
We demonstrate our approach for predicting the properties of a time-varying charged particle beam in a particle accelerator.
arXiv Detail & Related papers (2021-05-08T03:50:45Z) - Video-based Remote Physiological Measurement via Cross-verified Feature
Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations.
We then use the distilled physiological features for robust multi-task physiological measurements.
The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.