GLProtein: Global-and-Local Structure Aware Protein Representation Learning
- URL: http://arxiv.org/abs/2506.06294v1
- Date: Sat, 17 May 2025 14:45:13 GMT
- Title: GLProtein: Global-and-Local Structure Aware Protein Representation Learning
- Authors: Yunqing Liu, Wenqi Fan, Xiaoyong Wei, Qing Li,
- Abstract summary: We argue that the structural information of proteins is not only limited to their 3D information but also encompasses information from amino acid molecules (local information) to protein-protein structure similarity (global information)<n>We propose textbfGLProtein, the first framework in protein pre-training that incorporates both global structural similarity and local amino acid details to enhance prediction accuracy and functional insights.
- Score: 22.154564355158914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Proteins are central to biological systems, participating as building blocks across all forms of life. Despite advancements in understanding protein functions through protein sequence analysis, there remains potential for further exploration in integrating protein structural information. We argue that the structural information of proteins is not only limited to their 3D information but also encompasses information from amino acid molecules (local information) to protein-protein structure similarity (global information). To address this, we propose \textbf{GLProtein}, the first framework in protein pre-training that incorporates both global structural similarity and local amino acid details to enhance prediction accuracy and functional insights. GLProtein innovatively combines protein-masked modelling with triplet structure similarity scoring, protein 3D distance encoding and substructure-based amino acid molecule encoding. Experimental results demonstrate that GLProtein outperforms previous methods in several bioinformatics tasks, including predicting protein-protein interaction, contact prediction, and so on.
Related papers
- Protein Large Language Models: A Comprehensive Survey [71.65899614084853]
Protein-specific large language models (Protein LLMs) are revolutionizing protein science by enabling more efficient protein structure prediction, function annotation, and design.<n>This work provides the first comprehensive overview of Protein LLMs, covering their architectures, training datasets, evaluation metrics, and diverse applications.
arXiv Detail & Related papers (2025-02-21T19:22:10Z) - A Survey of Deep Learning Methods in Protein Bioinformatics and its Impact on Protein Design [3.5897534810405403]
Deep learning has demonstrated remarkable performance in fields such as computer vision and natural language processing.<n>It has been increasingly applied in recent years to the data-rich domain of protein sequences with great success.<n>The performance improvements achieved by deep learning unlocks new possibilities in the field of protein bioinformatics.
arXiv Detail & Related papers (2025-01-02T05:21:34Z) - Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs [26.727436310732692]
We propose a novel self-supervised method to pretrain 3D graph neural networks on 3D protein structures.
We experimentally show that our proposed pertaining strategy leads to significant improvements up to 6%.
arXiv Detail & Related papers (2024-06-20T09:34:31Z) - ProtT3: Protein-to-Text Generation for Text-based Protein Understanding [88.43323947543996]
Language Models (LMs) excel in understanding textual descriptions of proteins.
Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to process texts.
We introduce ProtT3, a framework for Protein-to-Text Generation for Text-based Protein Understanding.
arXiv Detail & Related papers (2024-05-21T08:06:13Z) - A Protein Structure Prediction Approach Leveraging Transformer and CNN
Integration [4.909112037834705]
This paper adopts a two-dimensional fusion deep neural network model, DstruCCN, which uses Convolutional Neural Networks (CCN) and a supervised Transformer protein language model for single-sequence protein structure prediction.
The training features of the two are combined to predict the protein Transformer binding site matrix, and then the three-dimensional structure is reconstructed using energy minimization.
arXiv Detail & Related papers (2024-02-29T12:24:20Z) - A Text-guided Protein Design Framework [106.79061950107922]
We propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design.<n>ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation.<n>We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.
arXiv Detail & Related papers (2023-02-09T12:59:16Z) - Multi-modal Protein Knowledge Graph Construction and Applications [30.500520131560112]
We create ProteinKG65, a knowledge graph for protein science.
Using gene ontology and Uniprot knowledge base as a basis, we transform various kinds of knowledge with aligned descriptions and protein sequences.
ProteinKG65 is mainly dedicated to providing a specialized protein knowledge graph, bringing the knowledge of Gene Ontology to protein function and structure prediction.
arXiv Detail & Related papers (2022-05-27T08:18:56Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Generative De Novo Protein Design with Global Context [36.21545615114117]
The inverse of protein structure prediction aims to obtain a novel protein sequence that will fold into the defined structure.
Recent works on computational protein design have studied designing sequences for the desired backbone structure with local positional information.
We propose the Global-Context Aware generative de novo protein design method (GCA), consisting of local and global modules.
arXiv Detail & Related papers (2022-04-21T02:55:01Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.