Large-scale Training of Foundation Models for Wearable Biosignals
- URL: http://arxiv.org/abs/2312.05409v2
- Date: Wed, 6 Mar 2024 18:18:15 GMT
- Title: Large-scale Training of Foundation Models for Wearable Biosignals
- Authors: Salar Abbaspourazad, Oussama Elachqar, Andrew C. Miller, Saba Emrani,
Udhyakumar Nallasamy, Ian Shapiro
- Abstract summary: Tracking biosignals is crucial for monitoring wellness and preempting the development of severe medical conditions.
Despite wearable and existing digital biomarkers, the absence of data with labels hinders the development of new biomarkers.
We train foundation models for two common biosignals: photo movement and electrocardiogram.
- Score: 1.8291790356553643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tracking biosignals is crucial for monitoring wellness and preempting the
development of severe medical conditions. Today, wearable devices can
conveniently record various biosignals, creating the opportunity to monitor
health status without disruption to one's daily routine. Despite widespread use
of wearable devices and existing digital biomarkers, the absence of curated
data with annotated medical labels hinders the development of new biomarkers to
measure common health conditions. In fact, medical datasets are usually small
in comparison to other domains, which is an obstacle for developing neural
network models for biosignals. To address this challenge, we have employed
self-supervised learning using the unlabeled sensor data collected under
informed consent from the large longitudinal Apple Heart and Movement Study
(AHMS) to train foundation models for two common biosignals:
photoplethysmography (PPG) and electrocardiogram (ECG) recorded on Apple Watch.
We curated PPG and ECG datasets from AHMS that include data from ~141K
participants spanning ~3 years. Our self-supervised learning framework includes
participant level positive pair selection, stochastic augmentation module and a
regularized contrastive loss optimized with momentum training, and generalizes
well to both PPG and ECG modalities. We show that the pre-trained foundation
models readily encode information regarding participants' demographics and
health conditions. To the best of our knowledge, this is the first study that
builds foundation models using large-scale PPG and ECG data collected via
wearable consumer devices $\unicode{x2013}$ prior works have commonly used
smaller-size datasets collected in clinical and experimental settings. We
believe PPG and ECG foundation models can enhance future wearable devices by
reducing the reliance on labeled data and hold the potential to help the users
improve their health.
Related papers
- From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis [50.80532910808962]
We present GluFormer, a generative foundation model on biomedical temporal data based on a transformer architecture.
GluFormer generalizes to 15 different external datasets, including 4936 individuals across 5 different geographical regions.
It can also predict onset of future health outcomes even 4 years in advance.
arXiv Detail & Related papers (2024-08-20T13:19:06Z) - SSSD-ECG-nle: New Label Embeddings with Structured State-Space Models for ECG generation [0.0]
Diffusion models have made significant progress in recent years, creating the possibility for synthesizing data comparable to the real one.
We propose the SSSD-ECG-nle architecture based on SSSD-ECG with a modified conditioning mechanism and demonstrate its efficiency on downstream tasks.
arXiv Detail & Related papers (2024-07-15T16:31:25Z) - Scaling Representation Learning from Ubiquitous ECG with State-Space
Models [28.776392386988043]
We introduce textbfWildECG, a pre-trained state-space model for representation learning from ECG signals.
We train this model in a self-supervised manner with 275,000 10s ECG recordings collected in the wild and evaluate it on a range of downstream tasks.
arXiv Detail & Related papers (2023-09-26T22:08:19Z) - GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for
Robust Electrocardiogram Prediction [20.8603653664403]
We propose a physiologically-inspired data augmentation method to improve performance and increase the robustness of heart disease detection based on ECG signals.
We obtain augmented samples by perturbing the data distribution towards other classes along the geodesic in Wasserstein space.
Learning from 12-lead ECG signals, our model is able to distinguish five categories of cardiac conditions.
arXiv Detail & Related papers (2022-08-02T03:14:13Z) - Performer: A Novel PPG to ECG Reconstruction Transformer For a Digital
Biomarker of Cardiovascular Disease Detection [0.0]
Cardiovascular diseases (CVDs) have become the top one cause of death; three-quarters of these deaths occur in lower-income communities.
Electrocardiography (ECG) is infeasible for continuous cardiac monitoring due to its requirement for user participation.
Photoplethysmography is easy to collect, but the limited accuracy constrains its clinical usage.
arXiv Detail & Related papers (2022-04-25T17:10:13Z) - 2021 BEETL Competition: Advancing Transfer Learning for Subject
Independence & Heterogenous EEG Data Sets [89.84774119537087]
We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI)
Task 1 is centred on medical diagnostics, addressing automatic sleep stage annotation across subjects.
Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets.
arXiv Detail & Related papers (2022-02-14T12:12:20Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - Label scarcity in biomedicine: Data-rich latent factor discovery
enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics.
Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Learning Generalizable Physiological Representations from Large-scale
Wearable Data [12.863826659440026]
We present a novel self-supervised representation learning method using activity and heart rate (HR) signals without semantic labels.
We show that the resulting embeddings can generalize in various downstream tasks through transfer learning with linear classifiers.
Overall, we propose the first multimodal self-supervised method for behavioral and physiological data with implications for large-scale health and lifestyle monitoring.
arXiv Detail & Related papers (2020-11-09T17:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.