Related papers: Multimodal Pre-Training Model for Sequence-based Prediction of Protein-Protein Interaction

Multimodal Pre-Training Model for Sequence-based Prediction of Protein-Protein Interaction

URL: http://arxiv.org/abs/2112.04814v1
Date: Thu, 9 Dec 2021 10:21:52 GMT
Title: Multimodal Pre-Training Model for Sequence-based Prediction of Protein-Protein Interaction
Authors: Yang Xue, Zijing Liu, Xiaomin Fang, Fan Wang
Abstract summary: Pre-training a protein model to learn effective representation is critical for protein-protein interactions. Most pre-training models for PPIs are sequence-based, which naively adopt the language models used in natural language processing to amino acid sequences. We propose a multimodal protein pre-training model with three modalities: sequence, structure, and function.
Score: 7.022012579173686
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Protein-protein interactions (PPIs) are essentials for many biological processes where two or more proteins physically bind together to achieve their functions. Modeling PPIs is useful for many biomedical applications, such as vaccine design, antibody therapeutics, and peptide drug discovery. Pre-training a protein model to learn effective representation is critical for PPIs. Most pre-training models for PPIs are sequence-based, which naively adopt the language models used in natural language processing to amino acid sequences. More advanced works utilize the structure-aware pre-training technique, taking advantage of the contact maps of known protein structures. However, neither sequences nor contact maps can fully characterize structures and functions of the proteins, which are closely related to the PPI problem. Inspired by this insight, we propose a multimodal protein pre-training model with three modalities: sequence, structure, and function (S2F). Notably, instead of using contact maps to learn the amino acid-level rigid structures, we encode the structure feature with the topology complex of point clouds of heavy atoms. It allows our model to learn structural information about not only the backbones but also the side chains. Moreover, our model incorporates the knowledge from the functional description of proteins extracted from literature or manual annotations. Our experiments show that the S2F learns protein embeddings that achieve good performances on a variety of PPIs tasks, including cross-species PPI, antibody-antigen affinity prediction, antibody neutralization prediction for SARS-CoV-2, and mutation-driven binding affinity change prediction.

Related papers

PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z)
An All-Atom Generative Model for Designing Protein Complexes [49.09672038729524]
APM (All-Atom Protein Generative Model) is a model specifically designed for modeling multi-chain proteins. By integrating atom-level information and leveraging data on multi-chain proteins, APM is capable of precisely modeling inter-chain interactions and designing protein complexes with binding capabilities from scratch.
arXiv Detail & Related papers (2025-04-17T16:37:41Z)
Joint Masked Reconstruction and Contrastive Learning for Mining Interactions Between Proteins [4.254824555546419]
Protein-protein interaction (PPI) prediction is an instrumental means in elucidating the mechanisms underlying cellular operations. This paper introduces a novel PPI prediction method jointing masked reconstruction and contrastive learning, termed JmcPPI. Extensive experiments conducted on three widely utilized PPI datasets demonstrate that JmcPPI surpasses existing optimal baseline models.
arXiv Detail & Related papers (2025-03-06T17:39:12Z)
Extracting Inter-Protein Interactions Via Multitasking Graph Structure Learning [2.0096054368418814]
This paper proposes a novel PPI prediction method named MgslaPPI, which utilizes graph attention to mine protein structural information. Experimental results demonstrate that MgslaPPI significantly outperforms existing state-of-the-art methods under various data partitioning schemes.
arXiv Detail & Related papers (2025-01-29T11:44:49Z)
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models. It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features. Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z)
Long-context Protein Language Model [76.95505296417866]
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design. Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths. We propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built off selective structured state-space models. We also introduce its graph-contextual variant, LC-PLM-G, which contextualizes protein-protein interaction graphs for a second stage of training.
arXiv Detail & Related papers (2024-10-29T16:43:28Z)
CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation [7.161099050722313]
We develop a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro) CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes. We utilize Foldseek to encode protein structures into "structure-sequences" and trained a protein Structural Sequence Language Model, SSLM.
arXiv Detail & Related papers (2024-10-21T02:21:56Z)
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding [43.811432723460534]
We introduce Structure-Enhanced Protein Instruction Tuning (SEPIT) framework to bridge this gap. Our approach integrates a noval structure-aware module into pLMs to inform them with structural knowledge, and then connects these enhanced pLMs to large language models (LLMs) to generate understanding of proteins. We construct the largest and most comprehensive protein instruction dataset to date, which allows us to train and evaluate the general-purpose protein understanding model.
arXiv Detail & Related papers (2024-10-04T16:02:50Z)
Protein Representation Learning with Sequence Information Embedding: Does it Always Lead to a Better Performance? [4.7077642423577775]
We propose ProtLOCA, a local geometry alignment method based solely on amino acid structure representation. Our method outperforms existing sequence- and structure-based representation learning methods by more quickly and accurately matching structurally consistent protein domains.
arXiv Detail & Related papers (2024-06-28T08:54:37Z)
ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions. We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z)
PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery. Existing deep learning-based methods utilize only the single modality of protein sequences or structures. We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z)
Functional Geometry Guided Protein Sequence and Backbone Structure Co-Design [12.585697288315846]
We propose a model to jointly design Protein sequence and structure based on automatically detected functional sites. NAEPro is powered by an interleaving network of attention and equivariant layers, which can capture global correlation in a whole sequence. Experimental results show that our model consistently achieves the highest amino acid recovery rate, TM-score, and the lowest RMSD among all competitors.
arXiv Detail & Related papers (2023-10-06T16:08:41Z)
Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs) We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
Deep Learning Methods for Protein Family Classification on PDB Sequencing Data [0.0]
We demonstrate and compare the performance of several deep learning frameworks, including novel bi-directional LSTM and convolutional models, on widely available sequencing data. Our results show that our deep learning models deliver superior performance to classical machine learning methods, with the convolutional architecture providing the most impressive inference performance.
arXiv Detail & Related papers (2022-07-14T06:11:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.