PepLand: a large-scale pre-trained peptide representation model for a
comprehensive landscape of both canonical and non-canonical amino acids
- URL: http://arxiv.org/abs/2311.04419v1
- Date: Wed, 8 Nov 2023 01:18:32 GMT
- Title: PepLand: a large-scale pre-trained peptide representation model for a
comprehensive landscape of both canonical and non-canonical amino acids
- Authors: Ruochi Zhang (1,2,3), Haoran Wu (3), Yuting Xiu (3), Kewei Li (1,4),
Ningning Chen (3), Yu Wang (3), Yan Wang (1,2,4), Xin Gao (5,6,7), Fengfeng
Zhou (1,4,7) ((1) Key Laboratory of Symbolic Computation and Knowledge
Engineering of Ministry of Education, Jilin University, Changchun, China. (2)
School of Artificial Intelligence, Jilin University, Changchun, China. (3)
Syneron Technology, Guangzhou, China. (4) College of Computer Science and
Technology, Jilin University, Changchun, China. (5) Computational Bioscience
Research Center, King Abdullah University of Science and Technology (KAUST),
Thuwal, Saudi Arabia. (6) Computer Science Program, Computer, Electrical and
Mathematical Sciences and Engineering Division, King Abdullah University of
Science and Technology (KAUST), Thuwal, Saudi Arabia. (7) Corresponding
Authors)
- Abstract summary: PepLand is a novel pre-training architecture for representation and property analysis of peptides spanning both canonical and non-canonical amino acids.
In essence, PepLand leverages a comprehensive multi-view heterogeneous graph neural network tailored to unveil the subtle structural representations of peptides.
- Score: 0.4348327622270753
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, the scientific community has become increasingly interested
on peptides with non-canonical amino acids due to their superior stability and
resistance to proteolytic degradation. These peptides present promising
modifications to biological, pharmacological, and physiochemical attributes in
both endogenous and engineered peptides. Notwithstanding their considerable
advantages, the scientific community exhibits a conspicuous absence of an
effective pre-trained model adept at distilling feature representations from
such complex peptide sequences. We herein propose PepLand, a novel pre-training
architecture for representation and property analysis of peptides spanning both
canonical and non-canonical amino acids. In essence, PepLand leverages a
comprehensive multi-view heterogeneous graph neural network tailored to unveil
the subtle structural representations of peptides. Empirical validations
underscore PepLand's effectiveness across an array of peptide property
predictions, encompassing protein-protein interactions, permeability,
solubility, and synthesizability. The rigorous evaluation confirms PepLand's
unparalleled capability in capturing salient synthetic peptide features,
thereby laying a robust foundation for transformative advances in
peptide-centric research domains. We have made all the source code utilized in
this study publicly accessible via GitHub at
https://github.com/zhangruochi/pepland
Related papers
- Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties [5.812284760539713]
Multi-Peptide is an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties.
Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction.
This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
arXiv Detail & Related papers (2024-07-02T20:13:47Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - Full-Atom Peptide Design based on Multi-modal Flow Matching [32.58558711545861]
We present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides.
We characterize the peptide structure using rigid backbone frames within the $mathrmSE(3)$ manifold and side-chain angles on high-dimensional tori.
Our approach adeptly tackles various tasks such as fix-backbone sequence design and side-chain packing through partial sampling.
arXiv Detail & Related papers (2024-06-02T12:59:54Z) - AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information [46.23980841020632]
We propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide.
AdaNovo excels in identifying amino acids with post-translational modifications (PTMs) and exhibits robustness against data noise.
arXiv Detail & Related papers (2024-03-09T11:54:58Z) - PPFlow: Target-aware Peptide Design with Torsional Flow Matching [52.567714059931646]
We propose a target-aware peptide design method called textscPPFlow to model the internal geometries of torsion angles for the peptide structure design.
Besides, we establish a protein-peptide binding dataset named PPBench2024 to fill the void of massive data for the task of structure-based peptide drug design.
arXiv Detail & Related papers (2024-03-05T13:26:42Z) - PepGB: Facilitating peptide drug discovery via graph neural networks [36.744839520938825]
We propose PepGB, a deep learning framework to facilitate peptide early drug discovery by predicting peptide-protein interactions (PepPIs)
We derive an extended version, diPepGB, to tackle the bottleneck of modeling highly imbalanced data prevalent in lead generation and optimization processes.
arXiv Detail & Related papers (2024-01-26T06:13:09Z) - PepHarmony: A Multi-View Contrastive Learning Framework for Integrated
Sequence and Structure-Based Peptide Encoding [21.126660909515607]
This study introduces a novel multi-view contrastive learning framework PepHarmony for the sequence-based peptide encoding task.
We carefully select datasets from the Protein Data Bank (PDB) and AlphaFold database to encompass a broad spectrum of peptide sequences and structures.
The experimental data highlights PepHarmony's exceptional capability in capturing the intricate relationship between peptide sequences and structures.
arXiv Detail & Related papers (2024-01-21T01:16:53Z) - ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide
Sequencing [70.12220342151113]
ContraNovo is a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides.
ContraNovo consistently outshines contemporary state-of-the-art solutions.
arXiv Detail & Related papers (2023-12-18T12:49:46Z) - Co-modeling the Sequential and Graphical Routes for Peptide
Representation Learning [67.66393016797181]
We propose a peptide co-modeling method, RepCon, to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models.
RepCon learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs.
Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework.
arXiv Detail & Related papers (2023-10-04T16:58:25Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.