PDeepPP:A Deep learning framework with Pretrained Protein language for   peptide classification
        - URL: http://arxiv.org/abs/2502.15610v1
 - Date: Fri, 21 Feb 2025 17:31:22 GMT
 - Title: PDeepPP:A Deep learning framework with Pretrained Protein language for   peptide classification
 - Authors: Jixiu Zhai, Tianchi Lu, Haitian Zhong, Ziyang Xu, Yuhuan Liu, Xueying Wang, Dan Huang, 
 - Abstract summary: We propose a deep learning framework that integrates pretrained protein language models with a neural network combining transformer and CNN for peptide classification.<n>This framework was applied to multiple tasks involving PTM site and bioactive peptide prediction, utilizing large-scale datasets to enhance the model's robustness.
 - Score: 6.55419985735241
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Protein post-translational modifications (PTMs) and bioactive peptides (BPs) play critical roles in various biological processes and have significant therapeutic potential. However, identifying PTM sites and bioactive peptides through experimental methods is often labor-intensive, costly, and time-consuming. As a result, computational tools, particularly those based on deep learning, have become effective solutions for predicting PTM sites and peptide bioactivity. Despite progress in this field, existing methods still struggle with the complexity of protein sequences and the challenge of requiring high-quality predictions across diverse datasets.   To address these issues, we propose a deep learning framework that integrates pretrained protein language models with a neural network combining transformer and CNN for peptide classification. By leveraging the ability of pretrained models to capture complex relationships within protein sequences, combined with the predictive power of parallel networks, our approach improves feature extraction while enhancing prediction accuracy.   This framework was applied to multiple tasks involving PTM site and bioactive peptide prediction, utilizing large-scale datasets to enhance the model's robustness. In the comparison across 33 tasks, the model achieved state-of-the-art (SOTA) performance in 25 of them, surpassing existing methods and demonstrating its versatility across different datasets. Our results suggest that this approach provides a scalable and effective solution for large-scale peptide discovery and PTM analysis, paving the way for more efficient peptide classification and functional annotation. 
 
       
      
        Related papers
        - ResCap-DBP: A Lightweight Residual-Capsule Network for Accurate   DNA-Binding Protein Prediction Using Global ProteinBERT Embeddings [9.626183317998143]
We propose a novel deep learning framework, ResCap-DBP, that combines a residual learning-based encoder with a one-dimensional Capsule Network.<n>ProteinBERT embeddings substantially outperform other representations on large datasets.<n>Our model consistently outperforms current state-of-the-art methods.
arXiv  Detail & Related papers  (2025-07-27T21:54:32Z) - PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to   Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv  Detail & Related papers  (2025-07-07T15:21:05Z) - ProtCLIP: Function-Informed Protein Multi-Modal Learning [18.61302416993122]
We develop ProtCLIP, a multi-modality foundation model that represents function-aware protein embeddings.<n>Our ProtCLIP consistently achieves SOTA performance, with remarkable improvements of 75% on average in five cross-modal transformation benchmarks.<n>The experimental results verify the extraordinary potential of ProtCLIP serving as the protein multi-modality foundation model.
arXiv  Detail & Related papers  (2024-12-28T04:23:47Z) - Multi-modal Representation Learning Enables Accurate Protein Function   Prediction in Low-Data Setting [0.0]
HOPER (HOlistic ProtEin Representation) is a novel framework designed to enhance protein function prediction (PFP) in low-data settings.<n>Our results highlight the effectiveness of multimodal representation learning for overcoming data limitations in biological research.
arXiv  Detail & Related papers  (2024-11-22T20:13:55Z) - MeToken: Uniform Micro-environment Token Boosts Post-Translational   Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome.
Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs.
We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv  Detail & Related papers  (2024-11-04T07:14:28Z) - Peptide-GPT: Generative Design of Peptides using Generative Pre-trained   Transformers and Bio-informatic Supervision [7.275932354889042]
We introduce a protein language model tailored to generate protein sequences with distinct properties.
We rank the generated sequences based on their perplexity scores, then we filter out those lying outside the permissible convex hull of proteins.
We achieved an accuracy of 76.26% in hemolytic, 72.46% in non-hemolytic, 78.84% in non-fouling, and 68.06% in solubility protein generation.
arXiv  Detail & Related papers  (2024-10-25T00:15:39Z) - Multi-Peptide: Multimodality Leveraged Language-Graph Learning of   Peptide Properties [5.812284760539713]
Multi-Peptide is an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties.
 Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction.
This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
arXiv  Detail & Related papers  (2024-07-02T20:13:47Z) - ContactNet: Geometric-Based Deep Learning Model for Predicting   Protein-Protein Interactions [2.874893537471256]
We develop a novel attention-based Graph Neural Network (GNN), ContactNet, for classifying PPI models into accurate and incorrect ones.
When trained on docked antigen and modeled antibody structures, ContactNet doubles the accuracy of current state-of-the-art scoring functions.
arXiv  Detail & Related papers  (2024-06-26T12:54:41Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing   Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv  Detail & Related papers  (2024-06-16T08:23:21Z) - PPFlow: Target-aware Peptide Design with Torsional Flow Matching [52.567714059931646]
We propose a target-aware peptide design method called textscPPFlow to model the internal geometries of torsion angles for the peptide structure design.
Besides, we establish a protein-peptide binding dataset named PPBench2024 to fill the void of massive data for the task of structure-based peptide drug design.
arXiv  Detail & Related papers  (2024-03-05T13:26:42Z) - MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction
  Prediction via Microenvironment-Aware Protein Embedding [82.31506767274841]
Protein-Protein Interactions (PPIs) are fundamental in various biological processes and play a key role in life activities.
MPAE-PPI encodes microenvironments into chemically meaningful discrete codes via a sufficiently large microenvironment "vocabulary"
MPAE-PPI can scale to PPI prediction with millions of PPIs with superior trade-offs between effectiveness and computational efficiency.
arXiv  Detail & Related papers  (2024-02-22T09:04:41Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
  Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv  Detail & Related papers  (2023-12-07T03:25:49Z) - PTransIPs: Identification of phosphorylation sites enhanced by protein
  PLM embeddings [2.971764950146918]
We develop PTransIPs, a new deep learning framework for the identification of phosphorylation sites.
PTransIPs outperforms existing state-of-the-art (SOTA) methods, achieving AUCs of 0.9232 and 0.9660.
arXiv  Detail & Related papers  (2023-08-08T07:50:38Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
  Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv  Detail & Related papers  (2023-07-17T00:43:33Z) - State-specific protein-ligand complex structure prediction with a
  multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv  Detail & Related papers  (2022-09-30T01:46:38Z) - Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate
  Folding Landscape and Protein Structure Prediction [28.630603355510324]
We present EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets.
By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime.
arXiv  Detail & Related papers  (2022-08-20T10:23:17Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction   with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities.
It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset.
In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv  Detail & Related papers  (2022-08-19T14:55:12Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
  Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv  Detail & Related papers  (2021-10-14T13:28:02Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
  Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv  Detail & Related papers  (2021-05-11T03:40:29Z) - Assigning function to protein-protein interactions: a weakly supervised
  BioBERT based approach using PubMed abstracts [2.208694022993555]
Protein-protein interactions (PPI) are critical to the function of proteins in both normal and diseased cells.
Only a small percentage of PPIs captured in protein interaction databases have annotations of function available.
Here, we aim to label the function type of PPIs by extracting relationships described in PubMed abstracts.
arXiv  Detail & Related papers  (2020-08-20T01:42:28Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.