FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313
- URL: http://arxiv.org/abs/2407.09355v1
- Date: Fri, 12 Jul 2024 15:28:13 GMT
- Title: FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313
- Authors: Aaron Ge, Jeya Balasubramanian, Xueyao Wu, Peter Kraft, Jonas S. Almeida,
- Abstract summary: Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information.
We introduce a baseline for a novel genotype imputation pipeline that supports client-sided imputation models generalizable across any genotyping chip and genomic region.
We demonstrate that simple linear regression can significantly improve the accuracy of PRS313 scores when calculated using SNPs imputed from consumer gene panels, such as 23andMe.
- Score: 0.587470288031402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alternative by predicting missing genotypes without external databases, thereby enhancing privacy and accessibility. However, these methods often produce models with tens of millions of parameters, leading to challenges such as the need for substantial computational resources to train and inefficiency for client-sided deployment. Our study addresses these limitations by introducing a baseline for a novel genotype imputation pipeline that supports client-sided imputation models generalizable across any genotyping chip and genomic region. This approach enhances patient privacy by performing imputation directly on edge devices. As a case study, we focus on PRS313, a polygenic risk score comprising 313 SNPs used for breast cancer risk prediction. Utilizing consumer genetic panels such as 23andMe, our model democratizes access to personalized genetic insights by allowing 23andMe users to obtain their PRS313 score. We demonstrate that simple linear regression can significantly improve the accuracy of PRS313 scores when calculated using SNPs imputed from consumer gene panels, such as 23andMe. Our linear regression model achieved an R^2 of 0.86, compared to 0.33 without imputation and 0.28 with simple imputation (substituting missing SNPs with the minor allele frequency). These findings suggest that popular SNP analysis libraries could benefit from integrating linear regression models for genotype imputation, providing a viable and light-weight alternative to reference based imputation.
Related papers
- Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.
This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z) - Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion [0.1935997508026988]
We are proposing a segmentation model (DRU-Net) that can provide a delineation of human non-small cell lung carcinomas.
We have used two datasets (Norwegian Lung Cancer Biobank and Haukeland University Hospital lung cancer cohort) to create our proposed model.
The proposed spatial augmentation method (multi-lens distortion) improved the network performance by 3%.
arXiv Detail & Related papers (2024-06-20T13:14:00Z) - Kernel Cox partially linear regression: building predictive models for
cancer patients' survival [4.230753712933184]
We build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model.
We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and non-parametric predictors.
Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes.
arXiv Detail & Related papers (2023-10-11T04:27:54Z) - Developing a Novel Image Marker to Predict the Clinical Outcome of Neoadjuvant Chemotherapy (NACT) for Ovarian Cancer Patients [1.7623658472574557]
Neoadjuvant chemotherapy (NACT) is one kind of treatment for advanced stage ovarian cancer patients.
Partial responses to NACT may lead to suboptimal debulking surgery, which will result in adverse prognosis.
We developed a novel image marker to achieve high accuracy prognosis prediction of NACT at an early stage.
arXiv Detail & Related papers (2023-09-13T16:59:50Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Optimize Deep Learning Models for Prediction of Gene Mutations Using
Unsupervised Clustering [6.494144125433731]
Deep learning has become the mainstream methodological choice for analyzing and interpreting whole-slide digital pathology images.
In this paper, we proposed an unsupervised clustering-based multiple-instance learning, and apply our method to develop deep-learning models for prediction of gene mutations using WSIs from three cancer types.
We showed that unsupervised clustering of image patches could help identify predictive patches, exclude patches lack of predictive information, and therefore improve prediction on gene mutations in all three different cancer types.
arXiv Detail & Related papers (2022-03-31T11:48:21Z) - Segmentation by Test-Time Optimization (TTO) for CBCT-based Adaptive
Radiation Therapy [2.5705729402510338]
Traditional or deep learning (DL) based deformable image registration (DIR) can achieve improved results in many situations.
We propose a method called test-time optimization (TTO) to refine a pre-trained DL-based DIR population model.
Our proposed method is less susceptible to the generalizability problem, and thus can improve overall performance of different DL-based DIR models.
arXiv Detail & Related papers (2022-02-08T16:34:22Z) - StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact
Context-encoding Variational Autoencoder [48.2010192865749]
Unsupervised anomaly detection (UAD) can learn a data distribution from an unlabelled dataset of healthy subjects and then be applied to detect out of distribution samples.
This research proposes a compact version of the "context-encoding" VAE (ceVAE) model, combined with pre and post-processing steps, creating a UAD pipeline (StRegA)
The proposed pipeline achieved a Dice score of 0.642$pm$0.101 while detecting tumours in T2w images of the BraTS dataset and 0.859$pm$0.112 while detecting artificially induced anomalies.
arXiv Detail & Related papers (2022-01-31T14:27:35Z) - Wide & Deep neural network model for patch aggregation in CNN-based
prostate cancer detection systems [51.19354417900591]
Prostate cancer (PCa) is one of the leading causes of death among men, with almost 1.41 million new cases and around 375,000 deaths in 2020.
To perform an automatic diagnosis, prostate tissue samples are first digitized into gigapixel-resolution whole-slide images.
Small subimages called patches are extracted and predicted, obtaining a patch-level classification.
arXiv Detail & Related papers (2021-05-20T18:13:58Z) - Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system.
In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX)
The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.