A Machine Learning Approach to Predict Biological Age and its Longitudinal Drivers
- URL: http://arxiv.org/abs/2508.09747v1
- Date: Wed, 13 Aug 2025 12:22:12 GMT
- Title: A Machine Learning Approach to Predict Biological Age and its Longitudinal Drivers
- Authors: Nazira Dunbayeva, Yulong Li, Yutong Xie, Imran Razzak,
- Abstract summary: We develop a machine learning pipeline to predict age using a longitudinal cohort with data from two distinct time periods.<n>By engineering novel features that explicitly capture the rate of change (slope) of key biomarkers over time, we significantly improved model performance.<n>Our framework paves the way for clinical tools that dynamically track patient health trajectories, enabling early intervention and personalized prevention strategies.
- Score: 22.162067953837653
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Predicting an individual's aging trajectory is a central challenge in preventative medicine and bioinformatics. While machine learning models can predict chronological age from biomarkers, they often fail to capture the dynamic, longitudinal nature of the aging process. In this work, we developed and validated a machine learning pipeline to predict age using a longitudinal cohort with data from two distinct time periods (2019-2020 and 2021-2022). We demonstrate that a model using only static, cross-sectional biomarkers has limited predictive power when generalizing to future time points. However, by engineering novel features that explicitly capture the rate of change (slope) of key biomarkers over time, we significantly improved model performance. Our final LightGBM model, trained on the initial wave of data, successfully predicted age in the subsequent wave with high accuracy ($R^2 = 0.515$ for males, $R^2 = 0.498$ for females), significantly outperforming both traditional linear models and other tree-based ensembles. SHAP analysis of our successful model revealed that the engineered slope features were among the most important predictors, highlighting that an individual's health trajectory, not just their static health snapshot, is a key determinant of biological age. Our framework paves the way for clinical tools that dynamically track patient health trajectories, enabling early intervention and personalized prevention strategies for age-related diseases.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Phenome-Wide Multi-Omics Integration Uncovers Distinct Archetypes of Human Aging [28.20331959292183]
We developed and rigorously validated a multi-omics aging clock that robustly predicts diverse health outcomes and future disease risk.<n>Unotype clustering of the integrated molecular profiles from multi-omics uncovered distinct biological subtypes of aging.<n>These findings demonstrate the power of multi-omics integration to decode the molecular landscape of aging and lay the groundwork for personalized healthspan monitoring and precision strategies to prevent age-related diseases.
arXiv Detail & Related papers (2025-10-14T11:00:51Z) - A Vector-Quantized Foundation Model for Patient Behavior Monitoring [43.02353546717171]
This paper introduces a novel foundation model based on a modified vector quantized variational autoencoder, specifically designed to process real-world data from smartphones and wearable devices.<n>We leveraged the discrete latent representation of this model to effectively perform two downstream tasks, suicide risk assessment and emotional state prediction, on different held-out clinical cohorts without the need of fine-tuning.
arXiv Detail & Related papers (2025-03-19T14:01:16Z) - iTARGET: Interpretable Tailored Age Regression for Grouped Epigenetic Traits [0.0]
We propose a novel two-phase algorithm to accurately predict chronological age from DNA methylation patterns.<n>Our method not only improves prediction accuracy but also reveals key age-related CpG sites, detects age-specific changes in aging rates, and identifies pairwise interactions between CpG sites.<n> Experimental results show that our approach outperforms traditional epigenetic clocks and machine learning models.
arXiv Detail & Related papers (2025-01-04T23:06:46Z) - Towards modeling evolving longitudinal health trajectories with a transformer-based deep learning model [19.49711465571333]
We introduce a straightforward approach for training a Transformer-based deep learning model in a way that lets us analyze how individuals' trajectories change over time.<n>We focus here on a general task of predicting the onset of a range of common diseases in a given future forecast interval.<n>We find that this model performs comparably to other models, including a bi-directional transformer model, in terms of basic prediction performance.
arXiv Detail & Related papers (2024-12-12T02:13:53Z) - Time-to-Event Pretraining for 3D Medical Imaging [44.46415168541444]
We introduce time-to-event pretraining, a pretraining framework for 3D medical imaging models.<n>We use a dataset of 18,945 CT scans (4.2 million 2D images) and time-to-event distributions across thousands of EHR-derived tasks.<n>Our method improves outcome prediction, achieving an average AUROC increase of 23.7% and a 29.4% gain in Harrell's C-index across 8 benchmark tasks.
arXiv Detail & Related papers (2024-11-14T11:08:54Z) - Improving Diffusion Models for ECG Imputation with an Augmented Template
Prior [43.6099225257178]
noisy and poor-quality recordings are a major issue for signals collected using mobile health systems.
Recent studies have explored the imputation of missing values in ECG with probabilistic time-series models.
We present a template-guided denoising diffusion probabilistic model (DDPM), PulseDiff, which is conditioned on an informative prior for a range of health conditions.
arXiv Detail & Related papers (2023-10-24T11:34:15Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Safe AI for health and beyond -- Monitoring to transform a health
service [51.8524501805308]
We will assess the infrastructure required to monitor the outputs of a machine learning algorithm.
We will present two scenarios with examples of monitoring and updates of models.
arXiv Detail & Related papers (2023-03-02T17:27:45Z) - Neurodevelopmental Phenotype Prediction: A State-of-the-Art Deep
Learning Model [0.0]
We apply a deep neural network to analyse the cortical surface data of neonates.
Our goal is to identify neurodevelopmental biomarkers and to predict gestational age at birth based on these biomarkers.
arXiv Detail & Related papers (2022-11-16T11:15:23Z) - Assessing the Performance of Automated Prediction and Ranking of Patient
Age from Chest X-rays Against Clinicians [4.795478287106675]
Deep learning has been demonstrated to allow the accurate estimation of patient age from chest X-rays.
We present a novel comparative study of the performance of radiologists versus state-of-the-art deep learning models.
We train our models with a heterogeneous database of 1.8M chest X-rays with ground truth patient ages and investigate the limitations on model accuracy.
arXiv Detail & Related papers (2022-07-04T10:09:48Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - Evaluating the performance of personal, social, health-related,
biomarker and genetic data for predicting an individuals future health using
machine learning: A longitudinal analysis [0.0]
The aim of the study is to apply a machine learning approach to identify the relative contribution of personal, social, health-related, biomarker and genetic data as predictors of future health in individuals.
Two machine learning approaches were used to build predictive models: deep learning via neural networks and XGBoost.
Results found that health-related measures had the strongest prediction of future health status, with genetic data performing poorly.
arXiv Detail & Related papers (2021-04-26T12:31:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.