Evaluating the performance of personal, social, health-related,
biomarker and genetic data for predicting an individuals future health using
machine learning: A longitudinal analysis
- URL: http://arxiv.org/abs/2104.12516v1
- Date: Mon, 26 Apr 2021 12:31:40 GMT
- Title: Evaluating the performance of personal, social, health-related,
biomarker and genetic data for predicting an individuals future health using
machine learning: A longitudinal analysis
- Authors: Mark Green
- Abstract summary: The aim of the study is to apply a machine learning approach to identify the relative contribution of personal, social, health-related, biomarker and genetic data as predictors of future health in individuals.
Two machine learning approaches were used to build predictive models: deep learning via neural networks and XGBoost.
Results found that health-related measures had the strongest prediction of future health status, with genetic data performing poorly.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As we gain access to a greater depth and range of health-related information
about individuals, three questions arise: (1) Can we build better models to
predict individual-level risk of ill health? (2) How much data do we need to
effectively predict ill health? (3) Are new methods required to process the
added complexity that new forms of data bring? The aim of the study is to apply
a machine learning approach to identify the relative contribution of personal,
social, health-related, biomarker and genetic data as predictors of future
health in individuals. Using longitudinal data from 6830 individuals in the UK
from Understanding Society (2010-12 to 2015-17), the study compares the
predictive performance of five types of measures: personal (e.g. age, sex),
social (e.g. occupation, education), health-related (e.g. body weight, grip
strength), biomarker (e.g. cholesterol, hormones) and genetic single nucleotide
polymorphisms (SNPs). The predicted outcome variable was limiting long-term
illness one and five years from baseline. Two machine learning approaches were
used to build predictive models: deep learning via neural networks and XGBoost
(gradient boosting decision trees). Model fit was compared to traditional
logistic regression models. Results found that health-related measures had the
strongest prediction of future health status, with genetic data performing
poorly. Machine learning models only offered marginal improvements in model
accuracy when compared to logistic regression models, but also performed well
on other metrics e.g. neural networks were best on AUC and XGBoost on
precision. The study suggests that increasing complexity of data and methods
does not necessarily translate to improved understanding of the determinants of
health or performance of predictive models of ill health.
Related papers
- Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI [0.0]
We evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods.
XGBoost achieved the best model accuracy, which is 97%.
arXiv Detail & Related papers (2024-04-06T17:23:21Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - Towards a Transportable Causal Network Model Based on Observational
Healthcare Data [1.333879175460266]
We propose a novel approach that combines selection diagrams, missingness graphs, causal discovery and prior knowledge into a single graphical model.
We learn this model from data comprising two different cohorts of patients.
The resulting causal network model is validated by expert clinicians in terms of risk assessment, accuracy and explainability.
arXiv Detail & Related papers (2023-11-13T13:23:31Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Machine Learning-based Biological Ageing Estimation Technologies: A
Survey [2.9554549423413303]
We will mainly review three age prediction methods by using machine learning (ML)
They are based on blood biomarkers, facial images, and structural features.
The prediction accuracy is not very good, which cannot make a great contribution to the medical field.
arXiv Detail & Related papers (2022-06-25T13:38:39Z) - Interpretable machine learning for high-dimensional trajectories of
aging health [0.0]
We have built a computational model for individual aging trajectories of health and survival.
It contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information.
Our model is scalable to large longitudinal data sets and infers an interpretable network of directed interactions between the health variables.
arXiv Detail & Related papers (2021-05-07T17:42:15Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Deep learning for prediction of population health costs [0.0]
We developed a deep neural network to predict future cost from health insurance claims records.
We applied the deep network and a ridge regression model to a sample of 1.4 million German insurants to predict total one-year health care costs.
arXiv Detail & Related papers (2020-03-06T23:33:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.