Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models
- URL: http://arxiv.org/abs/2601.12215v1
- Date: Sun, 18 Jan 2026 01:34:47 GMT
- Title: Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models
- Authors: Megha Thukral, Cyrus Tanade, Simon A. Lee, Juhyeon Lee, Hao Zhou, Keum San Chun, Migyeong Gwak, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Mehrab Bin Morshed, Subramaniam Venkatraman, Sharanya Arcot Desai,
- Abstract summary: Masked Multiscale Reconstruction (MMR) is a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data.<n>We pretrain our model with MMR using 17 million unlabeled 10-second PPG segments from 32,000 users.<n>On 17 of 19 diverse health-related tasks, MMR trained on large-scale wearable PPG data improves over or matches state-of-the-art open-source PPG models.
- Score: 13.267230682892503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Motivated by the insight that many downstream health-related tasks depend on multi-resolution features spanning fine-grained waveform morphology to global rhythmic dynamics, we introduce Masked Multiscale Reconstruction (MMR) for PPG representation learning - a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data. The pretraining task is designed to reconstruct randomly masked out coefficients obtained from a wavelet-based multiresolution decomposition of PPG signals, forcing the transformer encoder to integrate information across temporal and spectral scales. We pretrain our model with MMR using ~17 million unlabeled 10-second PPG segments from ~32,000 smartwatch users. On 17 of 19 diverse health-related tasks, MMR trained on large-scale wearable PPG data improves over or matches state-of-the-art open-source PPG foundation models, time-series foundation models, and other self-supervised baselines. Extensive analysis of our learned embeddings and systematic ablations underscores the value of wavelet-based representations, showing that they capture robust and physiologically-grounded features. Together, these results highlight the potential of MMR as a step toward generalizable PPG foundation models.
Related papers
- PENGUIN: General Vital Sign Reconstruction from PPG with Flow Matching State Space Model [0.0]
Photoplethysmography ( PPG) plays a crucial role in continuous cardiovascular health monitoring as a non-invasive and cost-effective modality.<n>Existing estimation methods are often restricted to a single-task or environment, limiting their generalizability across diverse PPG decoding scenarios.<n>We propose PENGUIN, a generative flow-matching framework that extends deep state space models, enabling fine-grained conditioning on PPG for reconstructing vital signs as continuous waveforms.
arXiv Detail & Related papers (2026-01-23T13:23:38Z) - Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement [0.0]
Remote photoplethys (rmography) is an emerging physiological sensing technique that leverages subtle color variations in facial videos to estimate vital signs such as heart rate and respiratory rate.<n>This non-invasive technique has gained traction across diverse domains, but its ability to capture fine-grained temporal dynamics under real-world conditions has been underexplored.<n>We propose Graph Reperio-r, a novel framework that strategically integrates a Transformer to effectively capture the periodic structure.
arXiv Detail & Related papers (2025-11-08T09:41:34Z) - TYrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation [51.56484100374058]
This paper introduces an innovative video understanding block (GVB) designed for efficient RGB videos.<n>Based on the Mam structure, this block integrates 2D-CNN and 3D-CNN to enhance video understanding for analysis.<n>Experiments show that our TYr can achieve state-of-the-art performance in commonly used datasets.
arXiv Detail & Related papers (2025-11-08T03:46:58Z) - KM-GPT: An Automated Pipeline for Reconstructing Individual Patient Data from Kaplan-Meier Plots [45.53914693601933]
We develop KM-GPT, the first fully automated, AI-powered pipeline for reconstructing IPD directly from Kaplan-Meier plots.<n> KM-GPT integrates advanced image preprocessing, multi-modal reasoning powered by GPT-5, and iterative reconstruction algorithms.<n>Its hybrid reasoning architecture automates the conversion of unstructured information into structured data flows.<n> KM-GPT was rigorously evaluated on synthetic and real-world datasets, consistently demonstrating superior accuracy.
arXiv Detail & Related papers (2025-09-15T00:38:38Z) - PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation [14.124553708665117]
A novel wavelet-based approach for physiological signal analysis is presented.<n>Two large-scale pretrained models specific to EMG and ECG are introduced for the first time.<n>A unified multi-modal framework is constructed by integrating pretrained EEG model.
arXiv Detail & Related papers (2025-06-12T05:11:41Z) - PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing [49.243031514520794]
Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
arXiv Detail & Related papers (2025-05-06T15:18:38Z) - PaPaGei: Open Foundation Models for Optical Physiological Signals [8.78925327256804]
Photoplethysmography is the leading non-invasive technique for monitoring biosignals and cardiovascular health.<n>Machine learning models trained on PPG signals tend to be task-specific and struggle with generalization.<n>We present PaPaGei, the first open foundation model for PPG signals.
arXiv Detail & Related papers (2024-10-27T18:18:06Z) - Amplitude-Independent Machine Learning for PPG through Visibility Graphs
and Transfer Learning [16.79885220470521]
Photoplethysmography (Photoplethysmography) refers to the measurement of variations in blood volume using light.
Photoplethysmography signals provide insight into the body's circulatory system.
Photoplethysmography signals can be employed to extract various bio-features, such as heart rate and vascular ageing.
arXiv Detail & Related papers (2023-05-23T13:41:52Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Video-based Remote Physiological Measurement via Cross-verified Feature
Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations.
We then use the distilled physiological features for robust multi-task physiological measurements.
The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.