Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank
- URL: http://arxiv.org/abs/2404.17626v2
- Date: Tue, 7 May 2024 16:21:28 GMT
- Title: Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank
- Authors: Thomas Le Menestrel, Erin Craig, Robert Tibshirani, Trevor Hastie, Manuel Rivas,
- Abstract summary: Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
- Score: 69.90493129893112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals, underscoring a critical gap in genetic research. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data. We evaluate the performance of Group-LASSO INTERaction-NET (glinternet) and pretrained lasso in disease prediction focusing on diverse ancestries in the UK Biobank. Models were trained on data from White British and other ancestries and validated across a cohort of over 96,000 individuals for 8 diseases. Out of 96 models trained, we report 16 with statistically significant incremental predictive performance in terms of ROC-AUC scores (p-value < 0.05), found for diabetes, arthritis, gall stones, cystitis, asthma and osteoarthritis. For the interaction and pretrained models that outperformed the baseline, the PRS score was the primary driver behind prediction. Our findings indicate that both interaction terms and pre-training can enhance prediction accuracy but for a limited set of diseases and moderate improvements in accuracy
Related papers
- Enhancing End Stage Renal Disease Outcome Prediction: A Multi-Sourced Data-Driven Approach [7.212939068975618]
We utilized data about 10,326 CKD patients, combining their clinical and claims information from 2009 to 2018.
A 24-month observation window was identified as optimal for balancing early detection and prediction accuracy.
The 2021 eGFR equation improved prediction accuracy and reduced racial bias, notably for African American patients.
arXiv Detail & Related papers (2024-10-02T03:21:01Z) - From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis [50.80532910808962]
We present GluFormer, a generative foundation model on biomedical temporal data based on a transformer architecture.
GluFormer generalizes to 15 different external datasets, including 4936 individuals across 5 different geographical regions.
It can also predict onset of future health outcomes even 4 years in advance.
arXiv Detail & Related papers (2024-08-20T13:19:06Z) - Can Machine Learning Assist in Diagnosis of Primary Immune Thrombocytopenia? A feasibility study [12.4123972735841]
Primary Immune thrombocytopenia (ITP) is a rare autoimmune disease characterised by immune-mediated destruction of peripheral blood platelets in patients.
There is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcome.
We conduct a feasibility study to check if machine learning can be applied effectively for diagnosis of ITP using routine blood tests and demographic data in a non-acute outpatient setting.
arXiv Detail & Related papers (2024-05-31T01:04:46Z) - Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience [21.420302408947194]
We develop a weighted empirical risk minimization approach that optimally combines data from a source group to make predictions on a target group.
We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of Alzheimer's disease and estimation of brain age.
arXiv Detail & Related papers (2023-08-06T18:05:39Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Individualized Prediction of COVID-19 Adverse outcomes with MLHO [9.197411456718708]
We developed an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health outcomes.
We modeled the four adverse outcomes utilizing about 600 features representing patients' pre-COVID health records and demographics.
Our results demonstrated that while demographic variables are important predictors of adverse outcomes after a COVID-19 infection, the incorporation of the past clinical records are vital for a reliable prediction model.
arXiv Detail & Related papers (2020-08-10T02:44:52Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - Short Term Blood Glucose Prediction based on Continuous Glucose
Monitoring Data [53.01543207478818]
This study explores the use of Continuous Glucose Monitoring (CGM) data as input for digital decision support tools.
We investigate how Recurrent Neural Networks (RNNs) can be used for Short Term Blood Glucose (STBG) prediction.
arXiv Detail & Related papers (2020-02-06T16:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.