Related papers: pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning

pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning

URL: http://arxiv.org/abs/2309.14404v1
Date: Mon, 25 Sep 2023 17:57:39 GMT
Title: pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning
Authors: Zebin Ma, Yonglin Zou, Xiaobin Huang, Wenjin Yan, Hao Xu, Jiexin Yang, Ying Zhang, Jinqi Huang
Abstract summary: pLPred is a tool for predicting functional peptides and identifying toxic peptides. On a validated independent test set, pLPred achieved accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score values of 0.974, 0.99, and 0.974, respectively.
Score: 7.5449239162950965
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Functional peptides have the potential to treat a variety of diseases. Their good therapeutic efficacy and low toxicity make them ideal therapeutic agents. Artificial intelligence-based computational strategies can help quickly identify new functional peptides from collections of protein sequences and discover their different functions.Using protein language model-based embeddings (ESM-2), we developed a tool called pLMFPPred (Protein Language Model-based Functional Peptide Predictor) for predicting functional peptides and identifying toxic peptides. We also introduced SMOTE-TOMEK data synthesis sampling and Shapley value-based feature selection techniques to relieve data imbalance issues and reduce computational costs. On a validated independent test set, pLMFPPred achieved accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score values of 0.974, 0.99, and 0.974, respectively. Comparative experiments show that pLMFPPred outperforms current methods for predicting functional peptides.The experimental results suggest that the proposed method (pLMFPPred) can provide better performance in terms of Accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score than existing methods. pLMFPPred has achieved good performance in predicting functional peptides and represents a new computational method for predicting functional peptides.

Related papers

PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z)
PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments [53.55710514466851]
Protein structure prediction is essential for drug discovery and understanding biological functions.<n>Most folding models rely heavily on multiple sequence alignments (MSAs) to boost prediction performance.<n>We propose PLAME, a novel MSA design model that leverages evolutionary embeddings from pretrained protein language models.
arXiv Detail & Related papers (2025-06-17T04:11:30Z)
ProtCLIP: Function-Informed Protein Multi-Modal Learning [18.61302416993122]
We develop ProtCLIP, a multi-modality foundation model that represents function-aware protein embeddings. Our ProtCLIP consistently achieves SOTA performance, with remarkable improvements of 75% on average in five cross-modal transformation benchmarks. The experimental results verify the extraordinary potential of ProtCLIP serving as the protein multi-modality foundation model.
arXiv Detail & Related papers (2024-12-28T04:23:47Z)
Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose. In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z)
Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties [5.812284760539713]
Multi-Peptide is an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties. Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
arXiv Detail & Related papers (2024-07-02T20:13:47Z)
Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction [3.2358123775807575]
Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants. We present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction.
arXiv Detail & Related papers (2024-05-10T14:50:40Z)
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation [38.019425619750265]
We introduce a novel variational pseudo-likelihood framework, ProtIR, designed to improve function predictors by incorporating inter-protein similarity modeling. ProtIR showcases around 10% improvement over vanilla predictor-based methods. It achieves performance on par with protein language model-based methods, yet without the need for massive pre-training.
arXiv Detail & Related papers (2024-02-10T17:31:46Z)
Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO) Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z)
Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models. It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z)
Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z)
Reprogramming Pretrained Language Models for Protein Sequence Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework. R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences. Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z)
Using Genetic Programming to Predict and Optimize Protein Function [65.25258357832584]
We propose POET, a computational Genetic Programming tool based on evolutionary methods to enhance screening and mutagenesis in Directed Evolution. As a proof-of-concept we use peptides that generate MRI contrast detected by the Chemical Exchange Saturation Transfer mechanism. Our results indicate that a computational modelling tool like POET can help to find peptides with 400% better functionality than used before.
arXiv Detail & Related papers (2022-02-08T18:08:08Z)
Prediction of Hemolysis Tendency of Peptides using a Reliable Evaluation Method [3.110575781525886]
Some peptides can pose low metabolic stability, high toxicity and high hemolity of peptides. Traditional methods for evaluation of toxicity of peptides can be time-consuming and costly. We propose a machine learning based method for prediction of hemolytic tendencies of peptides.
arXiv Detail & Related papers (2020-12-11T16:40:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.