Related papers: PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer

PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer

URL: http://arxiv.org/abs/2302.03548v1
Date: Tue, 7 Feb 2023 15:56:03 GMT
Title: PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer
Authors: Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Yawen Cui, Jiehua Zhang, Philip Torr and Guoying Zhao
Abstract summary: Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
Score: 76.40106756572644
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications (e.g., remote healthcare and affective computing). Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose two end-to-end video transformer based architectures, namely PhysFormer and PhysFormer++, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. To better exploit the temporal contextual and periodic rPPG clues, we also extend the PhysFormer to the two-pathway SlowFast based PhysFormer++ with temporal difference periodic and cross-attention transformers. Furthermore, we propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and PhysFormer++ and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. Unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer family can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community.

Related papers

Multivariate Long-term Time Series Forecasting with Fourier Neural Filter [55.09326865401653]
We introduce FNF as the backbone and DBD as architecture to provide excellent learning capabilities and optimal learning pathways for spatial-temporal modeling.<n>We show that FNF unifies local time-domain and global frequency-domain information processing within a single backbone that extends naturally to spatial modeling.
arXiv Detail & Related papers (2025-06-10T18:40:20Z)
Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained Factorization [7.947387272047604]
We present MMRPhys, an efficient dual-branch 3D-CNN architecture designed for simultaneous estimation of photoplethysmography (rRSP) and respiratory (rRSP) signals from multimodal video inputs.<n>We demonstrate that MMRPhys with TSFM significantly outperforms state-of-the-art methods in generalization across domain shifts for rRSP estimation, while maintaining a minimal inference latency suitable for real-time applications.
arXiv Detail & Related papers (2025-05-11T15:20:45Z)
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing [49.243031514520794]
Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
arXiv Detail & Related papers (2025-05-06T15:18:38Z)
PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba [20.435381963248787]
Previous deep learning based r measurement are primarily based on CNNs and Transformers. We propose PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos. Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba.
arXiv Detail & Related papers (2024-09-18T14:48:50Z)
PhysMamba: State Space Duality Model for Remote Physiological Measurement [20.441281420017656]
Remote Photoplethysmography (rBFC) is used in applications like emotion monitoring, medical assistance, and anti-face spoofing. Unlike controlled laboratory settings, real-world environments often contain motion artifacts and noise. We propose PhysMamba, a dual-Path-frequency model via State Space Duality. This method allows the network to learn richer, more representative features, enhancing robustness in noisy conditions.
arXiv Detail & Related papers (2024-08-02T07:52:28Z)
Dual-path TokenLearner for Remote Photoplethysmography-based Physiological Measurement with Facial Videos [24.785755814666086]
This paper utilizes the concept of learnable tokens to integrate both spatial and temporal informative contexts from the global perspective of the video. A Temporal TokenLearner (TTL) is designed to infer the quasi-periodic pattern of heartbeats, which eliminates temporal disturbances such as head movements.
arXiv Detail & Related papers (2023-08-15T13:45:45Z)
TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers [2.2120851074630177]
We develop a graph embedding model with uncertainty quantification, TransformerG2G, to learn temporal dynamics of temporal graphs. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods. By examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure.
arXiv Detail & Related papers (2023-07-05T18:34:22Z)
Deeply-Coupled Convolution-Transformer with Spatial-temporal Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID. Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z)
ETLP: Event-based Three-factor Local Plasticity for online learning with neuromorphic hardware [105.54048699217668]
We show a competitive performance in accuracy with a clear advantage in the computational complexity for Event-Based Three-factor Local Plasticity (ETLP) We also show that when using local plasticity, threshold adaptation in spiking neurons and a recurrent topology are necessary to learntemporal patterns with a rich temporal structure.
arXiv Detail & Related papers (2023-01-19T19:45:42Z)
Unsupervised inter-frame motion correction for whole-body dynamic PET using convolutional long short-term memory in a convolutional neural network [9.349668170221975]
We develop an unsupervised deep learning-based framework to correct inter-frame body motion. The motion estimation network is a convolutional neural network with a combined convolutional long short-term memory layer. Once trained, the motion estimation inference time of our proposed network was around 460 times faster than the conventional registration baseline.
arXiv Detail & Related papers (2022-06-13T17:38:16Z)
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
Adaptive Latent Space Tuning for Non-Stationary Distributions [62.997667081978825]
We present a method for adaptive tuning of the low-dimensional latent space of deep encoder-decoder style CNNs. We demonstrate our approach for predicting the properties of a time-varying charged particle beam in a particle accelerator.
arXiv Detail & Related papers (2021-05-08T03:50:45Z)
Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations. We then use the distilled physiological features for robust multi-task physiological measurements. The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.