pLMFPPred: a novel approach for accurate prediction of functional
peptides integrating embedding from pre-trained protein language model and
imbalanced learning
- URL: http://arxiv.org/abs/2309.14404v1
- Date: Mon, 25 Sep 2023 17:57:39 GMT
- Title: pLMFPPred: a novel approach for accurate prediction of functional
peptides integrating embedding from pre-trained protein language model and
imbalanced learning
- Authors: Zebin Ma, Yonglin Zou, Xiaobin Huang, Wenjin Yan, Hao Xu, Jiexin Yang,
Ying Zhang, Jinqi Huang
- Abstract summary: pLPred is a tool for predicting functional peptides and identifying toxic peptides.
On a validated independent test set, pLPred achieved accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score values of 0.974, 0.99, and 0.974, respectively.
- Score: 7.5449239162950965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Functional peptides have the potential to treat a variety of diseases. Their
good therapeutic efficacy and low toxicity make them ideal therapeutic agents.
Artificial intelligence-based computational strategies can help quickly
identify new functional peptides from collections of protein sequences and
discover their different functions.Using protein language model-based
embeddings (ESM-2), we developed a tool called pLMFPPred (Protein Language
Model-based Functional Peptide Predictor) for predicting functional peptides
and identifying toxic peptides. We also introduced SMOTE-TOMEK data synthesis
sampling and Shapley value-based feature selection techniques to relieve data
imbalance issues and reduce computational costs. On a validated independent
test set, pLMFPPred achieved accuracy, Area under the curve - Receiver
Operating Characteristics, and F1-Score values of 0.974, 0.99, and 0.974,
respectively. Comparative experiments show that pLMFPPred outperforms current
methods for predicting functional peptides.The experimental results suggest
that the proposed method (pLMFPPred) can provide better performance in terms of
Accuracy, Area under the curve - Receiver Operating Characteristics, and
F1-Score than existing methods. pLMFPPred has achieved good performance in
predicting functional peptides and represents a new computational method for
predicting functional peptides.
Related papers
- Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties [5.812284760539713]
Multi-Peptide is an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties.
Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction.
This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
arXiv Detail & Related papers (2024-07-02T20:13:47Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction [3.2358123775807575]
Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants.
We present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS)
These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction.
arXiv Detail & Related papers (2024-05-10T14:50:40Z) - ProtIR: Iterative Refinement between Retrievers and Predictors for
Protein Function Annotation [38.019425619750265]
We introduce a novel variational pseudo-likelihood framework, ProtIR, designed to improve function predictors by incorporating inter-protein similarity modeling.
ProtIR showcases around 10% improvement over vanilla predictor-based methods.
It achieves performance on par with protein language model-based methods, yet without the need for massive pre-training.
arXiv Detail & Related papers (2024-02-10T17:31:46Z) - Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO)
Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Reprogramming Pretrained Language Models for Protein Sequence
Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework.
R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences.
Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z) - Using Genetic Programming to Predict and Optimize Protein Function [65.25258357832584]
We propose POET, a computational Genetic Programming tool based on evolutionary methods to enhance screening and mutagenesis in Directed Evolution.
As a proof-of-concept we use peptides that generate MRI contrast detected by the Chemical Exchange Saturation Transfer mechanism.
Our results indicate that a computational modelling tool like POET can help to find peptides with 400% better functionality than used before.
arXiv Detail & Related papers (2022-02-08T18:08:08Z) - Prediction of Hemolysis Tendency of Peptides using a Reliable Evaluation
Method [3.110575781525886]
Some peptides can pose low metabolic stability, high toxicity and high hemolity of peptides.
Traditional methods for evaluation of toxicity of peptides can be time-consuming and costly.
We propose a machine learning based method for prediction of hemolytic tendencies of peptides.
arXiv Detail & Related papers (2020-12-11T16:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.