Prediction of Hemolysis Tendency of Peptides using a Reliable Evaluation
Method
- URL: http://arxiv.org/abs/2012.06470v1
- Date: Fri, 11 Dec 2020 16:40:13 GMT
- Title: Prediction of Hemolysis Tendency of Peptides using a Reliable Evaluation
Method
- Authors: Ali Raza, Hafiz Saud Arshad
- Abstract summary: Some peptides can pose low metabolic stability, high toxicity and high hemolity of peptides.
Traditional methods for evaluation of toxicity of peptides can be time-consuming and costly.
We propose a machine learning based method for prediction of hemolytic tendencies of peptides.
- Score: 3.110575781525886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are numerous peptides discovered through past decades, which exhibit
antimicrobial and anti-cancerous tendencies. Due to these reasons, peptides are
supposed to be sound therapeutic candidates. Some peptides can pose low
metabolic stability, high toxicity and high hemolity of peptides. This
highlights the importance for evaluating hemolytic tendencies and toxicity of
peptides, before using them for therapeutics. Traditional methods for
evaluation of toxicity of peptides can be time-consuming and costly. In this
study, we have extracted peptides data (Hemo-DB) from Database of Antimicrobial
Activity and Structure of Peptides (DBAASP) based on certain hemolity criteria
and we present a machine learning based method for prediction of hemolytic
tendencies of peptides (i.e. Hemolytic or Non-Hemolytic). Our model offers
significant improvement on hemolity prediction benchmarks. we also propose a
reliable clustering-based train-tests splitting method which ensures that no
peptide in train set is more than 40% similar to any peptide in test set. Using
this train-test split, we can get reliable estimated of expected model
performance on unseen data distribution or newly discovered peptides. Our model
tests 0.9986 AUC-ROC (Area Under Receiver Operating Curve) and 97.79% Accuracy
on test set of Hemo-DB using traditional random train-test splitting method.
Moreover, our model tests AUC-ROC of 0.997 and Accuracy of 97.58% while using
clustering-based train-test data split. Furthermore, we check our model on an
unseen data distribution (at Hemo-PI 3) and we recorded 0.8726 AUC-ROC and
79.5% accuracy. Using the proposed method, potential therapeutic peptides can
be screened, which may further in therapeutics and get reliable predictions for
unseen amino acids distribution of peptides and newly discovered peptides.
Related papers
- Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties [5.812284760539713]
Multi-Peptide is an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties.
Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction.
This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
arXiv Detail & Related papers (2024-07-02T20:13:47Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - PepGB: Facilitating peptide drug discovery via graph neural networks [36.744839520938825]
We propose PepGB, a deep learning framework to facilitate peptide early drug discovery by predicting peptide-protein interactions (PepPIs)
We derive an extended version, diPepGB, to tackle the bottleneck of modeling highly imbalanced data prevalent in lead generation and optimization processes.
arXiv Detail & Related papers (2024-01-26T06:13:09Z) - Prediction of MET Overexpression in Non-Small Cell Lung Adenocarcinomas
from Hematoxylin and Eosin Images [0.4306805601880342]
MET protein overexpression is a targetable event in non-small cell lung cancer (NSCLC)
Development of pre-screening algorithms using digitized hematoxylin and eosin (H&E)-stained slides to predict MET overexpression could promote testing for those who will benefit most.
arXiv Detail & Related papers (2023-10-11T17:32:24Z) - pLMFPPred: a novel approach for accurate prediction of functional
peptides integrating embedding from pre-trained protein language model and
imbalanced learning [7.5449239162950965]
pLPred is a tool for predicting functional peptides and identifying toxic peptides.
On a validated independent test set, pLPred achieved accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score values of 0.974, 0.99, and 0.974, respectively.
arXiv Detail & Related papers (2023-09-25T17:57:39Z) - An Efficient Consolidation of Word Embedding and Deep Learning
Techniques for Classifying Anticancer Peptides: FastText+BiLSTM [0.0]
Anticancer peptides (ACPs) are peptides with higher degree of selectivity and safety.
Recent scientific advancements generate an interest in peptide-based therapies.
ACPs offer the advantage of efficiently treating intended cells without negatively impacting normal cells.
arXiv Detail & Related papers (2023-09-21T13:25:11Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - Bayesian prognostic covariate adjustment [59.75318183140857]
Historical data about disease outcomes can be integrated into the analysis of clinical trials in many ways.
We build on existing literature that uses prognostic scores from a predictive model to increase the efficiency of treatment effect estimates.
arXiv Detail & Related papers (2020-12-24T05:19:03Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.