PepLand: a large-scale pre-trained peptide representation model for a
comprehensive landscape of both canonical and non-canonical amino acids
- URL: http://arxiv.org/abs/2311.04419v1
- Date: Wed, 8 Nov 2023 01:18:32 GMT
- Title: PepLand: a large-scale pre-trained peptide representation model for a
comprehensive landscape of both canonical and non-canonical amino acids
- Authors: Ruochi Zhang (1,2,3), Haoran Wu (3), Yuting Xiu (3), Kewei Li (1,4),
Ningning Chen (3), Yu Wang (3), Yan Wang (1,2,4), Xin Gao (5,6,7), Fengfeng
Zhou (1,4,7) ((1) Key Laboratory of Symbolic Computation and Knowledge
Engineering of Ministry of Education, Jilin University, Changchun, China. (2)
School of Artificial Intelligence, Jilin University, Changchun, China. (3)
Syneron Technology, Guangzhou, China. (4) College of Computer Science and
Technology, Jilin University, Changchun, China. (5) Computational Bioscience
Research Center, King Abdullah University of Science and Technology (KAUST),
Thuwal, Saudi Arabia. (6) Computer Science Program, Computer, Electrical and
Mathematical Sciences and Engineering Division, King Abdullah University of
Science and Technology (KAUST), Thuwal, Saudi Arabia. (7) Corresponding
Authors)
- Abstract summary: PepLand is a novel pre-training architecture for representation and property analysis of peptides spanning both canonical and non-canonical amino acids.
In essence, PepLand leverages a comprehensive multi-view heterogeneous graph neural network tailored to unveil the subtle structural representations of peptides.
- Score: 0.4348327622270753
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, the scientific community has become increasingly interested
on peptides with non-canonical amino acids due to their superior stability and
resistance to proteolytic degradation. These peptides present promising
modifications to biological, pharmacological, and physiochemical attributes in
both endogenous and engineered peptides. Notwithstanding their considerable
advantages, the scientific community exhibits a conspicuous absence of an
effective pre-trained model adept at distilling feature representations from
such complex peptide sequences. We herein propose PepLand, a novel pre-training
architecture for representation and property analysis of peptides spanning both
canonical and non-canonical amino acids. In essence, PepLand leverages a
comprehensive multi-view heterogeneous graph neural network tailored to unveil
the subtle structural representations of peptides. Empirical validations
underscore PepLand's effectiveness across an array of peptide property
predictions, encompassing protein-protein interactions, permeability,
solubility, and synthesizability. The rigorous evaluation confirms PepLand's
unparalleled capability in capturing salient synthetic peptide features,
thereby laying a robust foundation for transformative advances in
peptide-centric research domains. We have made all the source code utilized in
this study publicly accessible via GitHub at
https://github.com/zhangruochi/pepland
Related papers
- MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome.
Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs.
We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv Detail & Related papers (2024-11-04T07:14:28Z) - PepINVENT: Generative peptide design beyond the natural amino acids [34.04968462561752]
PepINVENT navigates the vast space of natural and non-natural amino acids to propose valid, novel, and diverse peptide designs.
PepINVENT coupled with reinforcement learning enables the goal-oriented design of peptides using its chemistry-informed generative capabilities.
arXiv Detail & Related papers (2024-09-21T06:53:03Z) - Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties [5.812284760539713]
Multi-Peptide is an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties.
Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction.
This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
arXiv Detail & Related papers (2024-07-02T20:13:47Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information [46.23980841020632]
We propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide.
AdaNovo excels in identifying amino acids with post-translational modifications (PTMs) and exhibits robustness against data noise.
arXiv Detail & Related papers (2024-03-09T11:54:58Z) - PepGB: Facilitating peptide drug discovery via graph neural networks [36.744839520938825]
We propose PepGB, a deep learning framework to facilitate peptide early drug discovery by predicting peptide-protein interactions (PepPIs)
We derive an extended version, diPepGB, to tackle the bottleneck of modeling highly imbalanced data prevalent in lead generation and optimization processes.
arXiv Detail & Related papers (2024-01-26T06:13:09Z) - PepHarmony: A Multi-View Contrastive Learning Framework for Integrated
Sequence and Structure-Based Peptide Encoding [21.126660909515607]
This study introduces a novel multi-view contrastive learning framework PepHarmony for the sequence-based peptide encoding task.
We carefully select datasets from the Protein Data Bank (PDB) and AlphaFold database to encompass a broad spectrum of peptide sequences and structures.
The experimental data highlights PepHarmony's exceptional capability in capturing the intricate relationship between peptide sequences and structures.
arXiv Detail & Related papers (2024-01-21T01:16:53Z) - ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide
Sequencing [70.12220342151113]
ContraNovo is a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides.
ContraNovo consistently outshines contemporary state-of-the-art solutions.
arXiv Detail & Related papers (2023-12-18T12:49:46Z) - Co-modeling the Sequential and Graphical Routes for Peptide
Representation Learning [67.66393016797181]
We propose a peptide co-modeling method, RepCon, to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models.
RepCon learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs.
Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework.
arXiv Detail & Related papers (2023-10-04T16:58:25Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.