PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer
- URL: http://arxiv.org/abs/2111.12082v1
- Date: Tue, 23 Nov 2021 18:57:11 GMT
- Title: PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer
- Authors: Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Philip Torr,
Guoying Zhao
- Abstract summary: Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
- Score: 55.936527926778695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote photoplethysmography (rPPG), which aims at measuring heart activities
and physiological signals from facial video without any contact, has great
potential in many applications (e.g., remote healthcare and affective
computing). Recent deep learning approaches focus on mining subtle rPPG clues
using convolutional neural networks with limited spatio-temporal receptive
fields, which neglect the long-range spatio-temporal perception and interaction
for rPPG modeling. In this paper, we propose the PhysFormer, an end-to-end
video transformer based architecture, to adaptively aggregate both local and
global spatio-temporal features for rPPG representation enhancement. As key
modules in PhysFormer, the temporal difference transformers first enhance the
quasi-periodic rPPG features with temporal difference guided global attention,
and then refine the local spatio-temporal representation against interference.
Furthermore, we also propose the label distribution learning and a curriculum
learning inspired dynamic constraint in frequency domain, which provide
elaborate supervisions for PhysFormer and alleviate overfitting. Comprehensive
experiments are performed on four benchmark datasets to show our superior
performance on both intra- and cross-dataset testings. One highlight is that,
unlike most transformer networks needed pretraining from large-scale datasets,
the proposed PhysFormer can be easily trained from scratch on rPPG datasets,
which makes it promising as a novel transformer baseline for the rPPG
community. The codes will be released at
https://github.com/ZitongYu/PhysFormer.
Related papers
- PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba [20.435381963248787]
Previous deep learning based r measurement are primarily based on CNNs and Transformers.
We propose PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos.
Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba.
arXiv Detail & Related papers (2024-09-18T14:48:50Z) - PhySU-Net: Long Temporal Context Transformer for rPPG with
Self-Supervised Pre-training [21.521146237660766]
We propose Phy-Net, the first long-temporal spatial map r transformer network and a self-supervised pre-training strategy.
Our model is tested on two public datasets (OBF and VIPL-HR) and shows superior performance in supervised training.
arXiv Detail & Related papers (2024-02-19T07:59:16Z) - Dual-path TokenLearner for Remote Photoplethysmography-based
Physiological Measurement with Facial Videos [24.785755814666086]
This paper utilizes the concept of learnable tokens to integrate both spatial and temporal informative contexts from the global perspective of the video.
A Temporal TokenLearner (TTL) is designed to infer the quasi-periodic pattern of heartbeats, which eliminates temporal disturbances such as head movements.
arXiv Detail & Related papers (2023-08-15T13:45:45Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - ETLP: Event-based Three-factor Local Plasticity for online learning with
neuromorphic hardware [105.54048699217668]
We show a competitive performance in accuracy with a clear advantage in the computational complexity for Event-Based Three-factor Local Plasticity (ETLP)
We also show that when using local plasticity, threshold adaptation in spiking neurons and a recurrent topology are necessary to learntemporal patterns with a rich temporal structure.
arXiv Detail & Related papers (2023-01-19T19:45:42Z) - Unsupervised inter-frame motion correction for whole-body dynamic PET
using convolutional long short-term memory in a convolutional neural network [9.349668170221975]
We develop an unsupervised deep learning-based framework to correct inter-frame body motion.
The motion estimation network is a convolutional neural network with a combined convolutional long short-term memory layer.
Once trained, the motion estimation inference time of our proposed network was around 460 times faster than the conventional registration baseline.
arXiv Detail & Related papers (2022-06-13T17:38:16Z) - Delayed Propagation Transformer: A Universal Computation Engine towards
Practical Control in Cyber-Physical Systems [68.75717332928205]
Multi-agent control is a central theme in the Cyber-Physical Systems.
This paper presents a new transformer-based model that specializes in the global modeling of CPS.
With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems.
arXiv Detail & Related papers (2021-10-29T17:20:53Z) - Video-based Remote Physiological Measurement via Cross-verified Feature
Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations.
We then use the distilled physiological features for robust multi-task physiological measurements.
The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z) - AutoHR: A Strong End-to-end Baseline for Remote Heart Rate Measurement
with Neural Searching [76.4844593082362]
We investigate the reason why existing end-to-end networks perform poorly in challenging conditions and establish a strong baseline for remote HR measurement with architecture search (NAS)
Comprehensive experiments are performed on three benchmark datasets on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2020-04-26T05:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.