ProPath: Disease-Specific Protein Language Model for Variant
Pathogenicity
- URL: http://arxiv.org/abs/2311.03429v2
- Date: Wed, 8 Nov 2023 04:35:37 GMT
- Title: ProPath: Disease-Specific Protein Language Model for Variant
Pathogenicity
- Authors: Huixin Zhan, Zijun Zhang
- Abstract summary: We propose a disease-specific textscprotein language model for variant textscpathogenicity, termed ProPath, to capture the pseudo-log-likelihood ratio in rare missense variants through a siamese network.
Our results demonstrate that ProPath surpasses the pre-trained ESM1b with an over $5%$ improvement in AUC across both datasets.
- Score: 11.414690866985474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical variant classification of pathogenic versus benign genetic variants
remains a pivotal challenge in clinical genetics. Recently, the proposition of
protein language models has improved the generic variant effect prediction
(VEP) accuracy via weakly-supervised or unsupervised training. However, these
VEPs are not disease-specific, limiting their adaptation at point-of-care. To
address this problem, we propose a disease-specific \textsc{pro}tein language
model for variant \textsc{path}ogenicity, termed ProPath, to capture the
pseudo-log-likelihood ratio in rare missense variants through a siamese
network. We evaluate the performance of ProPath against pre-trained language
models, using clinical variant sets in inherited cardiomyopathies and
arrhythmias that were not seen during training. Our results demonstrate that
ProPath surpasses the pre-trained ESM1b with an over $5\%$ improvement in AUC
across both datasets. Furthermore, our model achieved the highest performances
across all baselines for both datasets. Thus, our ProPath offers a potent
disease-specific variant effect prediction, particularly valuable for disease
associations and clinical applicability.
Related papers
- DYNA: Disease-Specific Language Model for Variant Pathogenicity [9.662269016653296]
We propose DYNA: Disease-specificity fine-tuning via a Siamese neural network.
We focus on various cardiovascular diseases, where gene-disease relationships of loss-of-function vs. gain-of-function dictate disease-specific VEP.
For non-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory axis of RNA splicing, the most common non-coding pathogenic mechanism in established clinical VEP guidelines.
The DYNA fine-tuned models show superior performance in the held-out rare variant testing set and are further replicated in large, clinically-relevant variant annotations in
arXiv Detail & Related papers (2024-05-31T19:52:17Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - BeeTLe: A Framework for Linear B-Cell Epitope Prediction and
Classification [0.43512163406551996]
This paper presents a new deep learning-based framework for linear B-cell prediction as well as antibody type-specific classification.
We propose an amino acid encoding method based on eigen decomposition to help the model learn the representations of antibodies.
Experimental results on data curated from the largest public database demonstrate the validity of the proposed methods.
arXiv Detail & Related papers (2023-09-05T09:18:29Z) - A Fully Automated and Explainable Algorithm for the Prediction of
Malignant Transformation in Oral Epithelial Dysplasia [8.927415909296819]
We develop an artificial intelligence algorithm that can assign an Oral Malignant Transformation (OMT) risk score.
The algorithm is based on the detection and segmentation of nuclei within (and around) the epithelium using an in-house segmentation model.
The proposed OMTscore yields an AUROC = 0.74 in predicting whether an OED progresses to malignancy or not.
arXiv Detail & Related papers (2023-07-06T19:11:00Z) - Unsupervised language models for disease variant prediction [3.6942566104432886]
We find that a single protein LM trained on broad sequence datasets can score pathogenicity for any gene variant zero-shot.
We show that it achieves scoring performance comparable to the state of the art when evaluated on clinically labeled variants of disease-related genes.
arXiv Detail & Related papers (2022-12-07T22:28:13Z) - Robust and Agnostic Learning of Conditional Distributional Treatment
Effects [62.44901952244514]
The conditional average treatment effect (CATE) is the best point prediction of individual causal effects.
In aggregate analyses, this is usually addressed by measuring distributional treatment effect (DTE)
We provide a new robust and model-agnostic methodology for learning the conditional DTE (CDTE) for a wide class of problems.
arXiv Detail & Related papers (2022-05-23T17:40:31Z) - A k-mer Based Approach for SARS-CoV-2 Variant Identification [55.78588835407174]
We show that preserving the order of the amino acids helps the underlying classifiers to achieve better performance.
We also show the importance of the different amino acids which play a key role in identifying variants and how they coincide with those reported by the USA's Centers for Disease Control and Prevention (CDC)
arXiv Detail & Related papers (2021-08-07T15:08:15Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Bayesian prognostic covariate adjustment [59.75318183140857]
Historical data about disease outcomes can be integrated into the analysis of clinical trials in many ways.
We build on existing literature that uses prognostic scores from a predictive model to increase the efficiency of treatment effect estimates.
arXiv Detail & Related papers (2020-12-24T05:19:03Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.