Deep neural network improves the estimation of polygenic risk scores for
breast cancer
- URL: http://arxiv.org/abs/2307.13010v1
- Date: Mon, 24 Jul 2023 13:35:36 GMT
- Title: Deep neural network improves the estimation of polygenic risk scores for
breast cancer
- Authors: Adrien Badr\'e, Li Zhang, Wellington Muchero, Justin C. Reynolds,
Chongle Pan
- Abstract summary: Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome.
A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms.
- Score: 3.9918594409417576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Polygenic risk scores (PRS) estimate the genetic risk of an individual for a
complex disease based on many genetic variants across the whole genome. In this
study, we compared a series of computational models for estimation of breast
cancer PRS. A deep neural network (DNN) was found to outperform alternative
machine learning techniques and established statistical algorithms, including
BLUP, BayesA and LDpred. In the test cohort with 50% prevalence, the Area Under
the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for
BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all
generated PRS that followed a normal distribution in the case population.
However, the PRS generated by DNN in the case population followed a bi-modal
distribution composed of two normal distributions with distinctly different
means. This suggests that DNN was able to separate the case population into a
high-genetic-risk case sub-population with an average PRS significantly higher
than the control population and a normal-genetic-risk case sub-population with
an average PRS similar to the control population. This allowed DNN to achieve
18.8% recall at 90% precision in the test cohort with 50% prevalence, which can
be extrapolated to 65.4% recall at 20% precision in a general population with
12% prevalence. Interpretation of the DNN model identified salient variants
that were assigned insignificant p-values by association studies, but were
important for DNN prediction. These variants may be associated with the
phenotype through non-linear relationships.
Related papers
- Artificial Intelligence-Based Triaging of Cutaneous Melanocytic Lesions [0.8864540224289991]
Pathologists are facing an increasing workload due to a growing volume of cases and the need for more comprehensive diagnoses.
We developed an artificial intelligence (AI) model for triaging cutaneous melanocytic lesions based on whole slide images.
arXiv Detail & Related papers (2024-10-14T13:49:04Z) - FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313 [0.587470288031402]
Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information.
We introduce a baseline for a novel genotype imputation pipeline that supports client-sided imputation models generalizable across any genotyping chip and genomic region.
We demonstrate that simple linear regression can significantly improve the accuracy of PRS313 scores when calculated using SNPs imputed from consumer gene panels, such as 23andMe.
arXiv Detail & Related papers (2024-07-12T15:28:13Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - BIO-CXRNET: A Robust Multimodal Stacking Machine Learning Technique for
Mortality Risk Prediction of COVID-19 Patients using Chest X-Ray Images and
Clinical Data [0.0]
This study uses 25 biomarkers and CXR images in predicting the risk in 930 COVID-19 patients admitted in Italy.
The proposed multimodal stacking technique produced the precision, sensitivity, and F1-score, of 89.03%, 90.44%, and 89.03%, respectively.
The nomogram-based scoring technique was able to predict the death probability of high-risk patients with an F1 score of 92.88 %.
arXiv Detail & Related papers (2022-06-15T15:23:43Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Osteoporosis Prescreening using Panoramic Radiographs through a Deep
Convolutional Neural Network with Attention Mechanism [65.70943212672023]
Deep convolutional neural network (CNN) with an attention module can detect osteoporosis on panoramic radiographs.
dataset of 70 panoramic radiographs (PRs) from 70 different subjects of age between 49 to 60 was used.
arXiv Detail & Related papers (2021-10-19T00:03:57Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z) - Spatial-And-Context aware (SpACe) "virtual biopsy" radiogenomic maps to
target tumor mutational status on structural MRI [0.7573687311514342]
"virtual biopsy" maps that incorporate context-features from co-localized biopsy site along with spatial-priors from population atlases.
SpACe maps obtained training and testing accuracies of 90% (n=71) and 90.48% (n=21) in identifying EGFR amplification status.
SpACe maps could provide surgical navigation to improve localization of sampling sites for targeting of specific driver genes in cancer.
arXiv Detail & Related papers (2020-06-17T13:57:59Z) - Addressing Artificial Intelligence Bias in Retinal Disease Diagnostics [5.34430209078787]
This study evaluated generative methods to potentially AI bias when diagnosing diabetic retinopathy.
Deep learning systems (DLS) face concepts at test/inference time they were not initially trained on.
Findings illustrate how data imbalance and domain generalization can lead to disparity of accuracy across subpopulations.
arXiv Detail & Related papers (2020-04-28T13:46:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.