Related papers: Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding

Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding

URL: http://arxiv.org/abs/2307.09169v1
Date: Mon, 17 Jul 2023 00:43:33 GMT
Title: Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding
Authors: Zihan Liu, Jiaqi Wang, Yun Luo, Shuang Zhao, Wenbin Li, Stan Z. Li
Abstract summary: This work provides a benchmark analysis of peptide encoding with advanced deep learning models. It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
Score: 57.89530563948755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for AI-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62,000 samples generated by coarse-grained molecular dynamics (CGMD). Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e., RNN, LSTM, and Transformer) and structural deep learning models (i.e., GCN, GAT, and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.

Related papers

Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification [53.488387420073536]
Life-Code is a comprehensive framework that spans different biological functions. Life-Code achieves state-of-the-art performance on various tasks across three omics.
arXiv Detail & Related papers (2025-02-11T06:53:59Z)
Molecular Fingerprints Are Strong Models for Peptide Function Prediction [0.0]
We study the effectiveness of molecular fingerprints for peptide property prediction. We demonstrate that domain-specific feature extraction from molecular graphs can outperform complex and computationally expensive models.
arXiv Detail & Related papers (2025-01-29T10:05:27Z)
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome. Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs. We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv Detail & Related papers (2024-11-04T07:14:28Z)
Exploring Latent Space for Generating Peptide Analogs Using Protein Language Models [1.5146068448101742]
The proposed method requires only a single sequence of interest, avoiding the need for large datasets. Our results show significant improvements over baseline models in similarity indicators of peptide structures, descriptors and bioactivities.
arXiv Detail & Related papers (2024-08-15T13:37:27Z)
Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties [5.812284760539713]
Multi-Peptide is an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties. Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
arXiv Detail & Related papers (2024-07-02T20:13:47Z)
NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing. It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics. Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z)
PPFlow: Target-aware Peptide Design with Torsional Flow Matching [52.567714059931646]
We propose a target-aware peptide design method called textscPPFlow to model the internal geometries of torsion angles for the peptide structure design. Besides, we establish a protein-peptide binding dataset named PPBench2024 to fill the void of massive data for the task of structure-based peptide drug design.
arXiv Detail & Related papers (2024-03-05T13:26:42Z)
PepGB: Facilitating peptide drug discovery via graph neural networks [36.744839520938825]
We propose PepGB, a deep learning framework to facilitate peptide early drug discovery by predicting peptide-protein interactions (PepPIs) We derive an extended version, diPepGB, to tackle the bottleneck of modeling highly imbalanced data prevalent in lead generation and optimization processes.
arXiv Detail & Related papers (2024-01-26T06:13:09Z)
PepHarmony: A Multi-View Contrastive Learning Framework for Integrated Sequence and Structure-Based Peptide Encoding [21.126660909515607]
This study introduces a novel multi-view contrastive learning framework PepHarmony for the sequence-based peptide encoding task. We carefully select datasets from the Protein Data Bank (PDB) and AlphaFold database to encompass a broad spectrum of peptide sequences and structures. The experimental data highlights PepHarmony's exceptional capability in capturing the intricate relationship between peptide sequences and structures.
arXiv Detail & Related papers (2024-01-21T01:16:53Z)
ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing [70.12220342151113]
ContraNovo is a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides. ContraNovo consistently outshines contemporary state-of-the-art solutions.
arXiv Detail & Related papers (2023-12-18T12:49:46Z)
Reprogramming Pretrained Language Models for Protein Sequence Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework. R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences. Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z)
Interpretable Structured Learning with Sparse Gated Sequence Encoder for Protein-Protein Interaction Prediction [2.9488233765621295]
Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology. We present a novel deep framework to model and predict PPIs from sequence alone. Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences.
arXiv Detail & Related papers (2020-10-16T17:13:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.