Improving genetic risk prediction across diverse population by
disentangling ancestry representations
- URL: http://arxiv.org/abs/2205.04673v1
- Date: Tue, 10 May 2022 05:05:37 GMT
- Title: Improving genetic risk prediction across diverse population by
disentangling ancestry representations
- Authors: Prashnna K Gyawali, Yann Le Guen, Xiaoxia Liu, Hua Tang, James Zou,
Zihuai He
- Abstract summary: We propose a novel deep-learning framework that disentangles ancestry from the phenotype-relevant information in its representation.
The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations.
- Score: 10.803542340843368
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Risk prediction models using genetic data have seen increasing traction in
genomics. However, most of the polygenic risk models were developed using data
from participants with similar (mostly European) ancestry. This can lead to
biases in the risk predictors resulting in poor generalization when applied to
minority populations and admixed individuals such as African Americans. To
address this bias, largely due to the prediction models being confounded by the
underlying population structure, we propose a novel deep-learning framework
that leverages data from diverse population and disentangles ancestry from the
phenotype-relevant information in its representation. The ancestry disentangled
representation can be used to build risk predictors that perform better across
minority populations. We applied the proposed method to the analysis of
Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk
prediction methods, the proposed method substantially improves risk prediction
in minority populations, particularly for admixed individuals.
Related papers
- Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Developing a Novel Holistic, Personalized Dementia Risk Prediction Model
via Integration of Machine Learning and Network Systems Biology Approaches [0.0]
The proposed framework utilizes a novel holistic approach to dementia risk prediction.
It is the first to incorporate various sources of environmental pollution and lifestyle factor data with network systems biology-based genetic data.
The developed model successfully employs holistic computational dementia risk prediction for clinical use.
arXiv Detail & Related papers (2023-10-04T02:47:29Z) - Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification.
We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks.
Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Targeting Underrepresented Populations in Precision Medicine: A
Federated Transfer Learning Approach [7.467496975496821]
We propose a two-way data integration strategy that integrates heterogeneous data from diverse populations and from multiple healthcare institutions.
We show that the proposed method improves the estimation and prediction accuracy in underrepresented populations, and reduces the gap of model performance across populations.
arXiv Detail & Related papers (2021-08-27T04:04:34Z) - Surrogate Assisted Semi-supervised Inference for High Dimensional Risk
Prediction [3.10560974227074]
We develop a surrogate assisted semi-supervised-learning (SAS) approach to risk modeling with high dimensional predictors.
We demonstrate that the SAS procedure provides valid inference for the predicted risk derived from a high dimensional working model.
arXiv Detail & Related papers (2021-05-04T03:08:51Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - A General Framework for Survival Analysis and Multi-State Modelling [70.31153478610229]
We use neural ordinary differential equations as a flexible and general method for estimating multi-state survival models.
We show that our model exhibits state-of-the-art performance on popular survival data sets and demonstrate its efficacy in a multi-state setting.
arXiv Detail & Related papers (2020-06-08T19:24:54Z) - Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles.
An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.