Improving PTM Site Prediction by Coupling of Multi-Granularity Structure
and Multi-Scale Sequence Representation
- URL: http://arxiv.org/abs/2401.10211v1
- Date: Thu, 4 Jan 2024 20:49:32 GMT
- Title: Improving PTM Site Prediction by Coupling of Multi-Granularity Structure
and Multi-Scale Sequence Representation
- Authors: Zhengyi Li, Menglu Li, Lida Zhu, Wen Zhang
- Abstract summary: Protein post-translational modification (PTM) site prediction is a fundamental task in bioinformatics.
We propose a PTM site prediction method by Coupling of Multi-Granularity structure and Multi-Scale sequence representation.
Extensive experiments on three datasets show that PTM-CMGMS outperforms the state-of-the-art methods.
- Score: 7.337067876477941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protein post-translational modification (PTM) site prediction is a
fundamental task in bioinformatics. Several computational methods have been
developed to predict PTM sites. However, existing methods ignore the structure
information and merely utilize protein sequences. Furthermore, designing a more
fine-grained structure representation learning method is urgently needed as PTM
is a biological event that occurs at the atom granularity. In this paper, we
propose a PTM site prediction method by Coupling of Multi-Granularity structure
and Multi-Scale sequence representation, PTM-CMGMS for brevity. Specifically,
multigranularity structure-aware representation learning is designed to learn
neighborhood structure representations at the amino acid, atom, and whole
protein granularity from AlphaFold predicted structures, followed by utilizing
contrastive learning to optimize the structure representations.Additionally,
multi-scale sequence representation learning is used to extract context
sequence information, and motif generated by aligning all context sequences of
PTM sites assists the prediction. Extensive experiments on three datasets show
that PTM-CMGMS outperforms the state-of-the-art methods.
Related papers
- MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome.
Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs.
We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv Detail & Related papers (2024-11-04T07:14:28Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.
Deep generative models have shown promise in generating protein conformations as a more efficient alternative.
We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation [7.161099050722313]
We develop a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro)
CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes.
We utilize Foldseek to encode protein structures into "structure-sequences" and trained a protein Structural Sequence Language Model, SSLM.
arXiv Detail & Related papers (2024-10-21T02:21:56Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - Disentangling Structured Components: Towards Adaptive, Interpretable and
Scalable Time Series Forecasting [52.47493322446537]
We develop a adaptive, interpretable and scalable forecasting framework, which seeks to individually model each component of the spatial-temporal patterns.
SCNN works with a pre-defined generative process of MTS, which arithmetically characterizes the latent structure of the spatial-temporal patterns.
Extensive experiments are conducted to demonstrate that SCNN can achieve superior performance over state-of-the-art models on three real-world datasets.
arXiv Detail & Related papers (2023-05-22T13:39:44Z) - A Systematic Study of Joint Representation Learning on Protein Sequences
and Structures [38.94729758958265]
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions.
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
Our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM with distinct structure encoders.
arXiv Detail & Related papers (2023-03-11T01:24:10Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Interpretable Structured Learning with Sparse Gated Sequence Encoder for
Protein-Protein Interaction Prediction [2.9488233765621295]
Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology.
We present a novel deep framework to model and predict PPIs from sequence alone.
Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences.
arXiv Detail & Related papers (2020-10-16T17:13:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.