BeatFormer: Efficient motion-robust remote heart rate estimation through unsupervised spectral zoomed attention filters
- URL: http://arxiv.org/abs/2507.14885v1
- Date: Sun, 20 Jul 2025 10:00:31 GMT
- Title: BeatFormer: Efficient motion-robust remote heart rate estimation through unsupervised spectral zoomed attention filters
- Authors: Joaquim Comas, Federico Sukno,
- Abstract summary: Photoplethysmography (r) captures cardiac signals from zoomed videos and is gaining attention for its diverse applications.<n>While deep learning has advanced r estimation, it relies on large, diverse datasets for effective generalization.<n>We present BeatFormer, a lightweight spectral attention model for r estimation, which integrates orthonormal complex attention and frequency-domain energy measurement.<n>We also introduce Spectral Contrastive Learning (SCL), which allows BeatFormer to be trained without any PPG or HR labels.
- Score: 3.069335774032178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Remote photoplethysmography (rPPG) captures cardiac signals from facial videos and is gaining attention for its diverse applications. While deep learning has advanced rPPG estimation, it relies on large, diverse datasets for effective generalization. In contrast, handcrafted methods utilize physiological priors for better generalization in unseen scenarios like motion while maintaining computational efficiency. However, their linear assumptions limit performance in complex conditions, where deep learning provides superior pulsatile information extraction. This highlights the need for hybrid approaches that combine the strengths of both methods. To address this, we present BeatFormer, a lightweight spectral attention model for rPPG estimation, which integrates zoomed orthonormal complex attention and frequency-domain energy measurement, enabling a highly efficient model. Additionally, we introduce Spectral Contrastive Learning (SCL), which allows BeatFormer to be trained without any PPG or HR labels. We validate BeatFormer on the PURE, UBFC-rPPG, and MMPD datasets, demonstrating its robustness and performance, particularly in cross-dataset evaluations under motion scenarios.
Related papers
- Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection [59.042018542376596]
Graph anomaly detection (GAD) has garnered increasing attention in recent years, yet remains challenging due to two key factors.<n>Anomaly-Aware Pre-Training and Fine-Tuning (APF) is a framework to mitigate the challenges in GAD.<n> Comprehensive experiments on 10 benchmark datasets validate the superior performance of APF in comparison to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-19T09:57:35Z) - Recovering Pulse Waves from Video Using Deep Unrolling and Deep Equilibrium Models [45.94962431110573]
Camera-based monitoring of vital signs, also known as imaging photoplethysmography (i), has seen applications in driver-monitoring, affective computing, and more.<n>We introduce methods that combine signal processing and deep learning methods in an inverse problem.
arXiv Detail & Related papers (2025-03-21T16:11:21Z) - Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography [5.743550396843244]
MAR-r is a framework that integrates the impact of ROI localization and complex motion artifacts.
MAR-r employs a masked attention regularization mechanism into the r field to capture semantic consistency of facial clips.
It also employs a masking technique to prevent the model from overfitting on inaccurate ROIs and subsequently degrading its performance.
arXiv Detail & Related papers (2024-07-09T08:25:30Z) - Deep adaptative spectral zoom for improved remote heart rate estimation [10.220888127527152]
Chirp-Z Transform (CZT) can refine the spectrum to the narrow-band range of interest for heart rate, providing improved frequential resolution and, consequently, more accurate estimation.
This paper presents the advantages of employing the CZT for remote HR estimation and introduces a novel data-driven adaptive CZT estimator.
arXiv Detail & Related papers (2024-03-11T16:55:19Z) - RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention [18.412642801957197]
RRhythm is a non-contact method for detecting physiological signals based on physiological videos.<n>This paper proposes a periodic attention mechanism based on temporal attention sparsity induced by periodicity.<n>It achieves state-of-the-art performance in both intra-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2024-02-20T07:56:02Z) - rPPG-MAE: Self-supervised Pre-training with Masked Autoencoders for
Remote Physiological Measurement [36.54109704201048]
Remote photoplethysmography (r-MAE) is an important technique for perceiving human vital signs.
In this paper, we develop a self-supervised framework for extracting inherent self-similar prior in physiological signals.
We also evaluate the proposed method on two public datasets, namely PURE and UBFC-r.
arXiv Detail & Related papers (2023-06-04T08:53:28Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - Diffusion Probabilistic Model Made Slim [128.2227518929644]
We introduce a customized design for slim diffusion probabilistic models (DPM) for light-weight image synthesis.
We achieve 8-18x computational complexity reduction as compared to the latent diffusion models on a series of conditional and unconditional image generation tasks.
arXiv Detail & Related papers (2022-11-27T16:27:28Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - AutoHR: A Strong End-to-end Baseline for Remote Heart Rate Measurement
with Neural Searching [76.4844593082362]
We investigate the reason why existing end-to-end networks perform poorly in challenging conditions and establish a strong baseline for remote HR measurement with architecture search (NAS)
Comprehensive experiments are performed on three benchmark datasets on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2020-04-26T05:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.