MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction
Prediction via Microenvironment-Aware Protein Embedding
- URL: http://arxiv.org/abs/2402.14391v1
- Date: Thu, 22 Feb 2024 09:04:41 GMT
- Title: MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction
Prediction via Microenvironment-Aware Protein Embedding
- Authors: Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V
Chawla, Stan Z. Li
- Abstract summary: Protein-Protein Interactions (PPIs) are fundamental in various biological processes and play a key role in life activities.
MPAE-PPI encodes microenvironments into chemically meaningful discrete codes via a sufficiently large microenvironment "vocabulary"
MPAE-PPI can scale to PPI prediction with millions of PPIs with superior trade-offs between effectiveness and computational efficiency.
- Score: 82.31506767274841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protein-Protein Interactions (PPIs) are fundamental in various biological
processes and play a key role in life activities. The growing demand and cost
of experimental PPI assays require computational methods for efficient PPI
prediction. While existing methods rely heavily on protein sequence for PPI
prediction, it is the protein structure that is the key to determine the
interactions. To take both protein modalities into account, we define the
microenvironment of an amino acid residue by its sequence and structural
contexts, which describe the surrounding chemical properties and geometric
features. In addition, microenvironments defined in previous work are largely
based on experimentally assayed physicochemical properties, for which the
"vocabulary" is usually extremely small. This makes it difficult to cover the
diversity and complexity of microenvironments. In this paper, we propose
Microenvironment-Aware Protein Embedding for PPI prediction (MPAE-PPI), which
encodes microenvironments into chemically meaningful discrete codes via a
sufficiently large microenvironment "vocabulary" (i.e., codebook). Moreover, we
propose a novel pre-training strategy, namely Masked Codebook Modeling (MCM),
to capture the dependencies between different microenvironments by randomly
masking the codebook and reconstructing the input. With the learned
microenvironment codebook, we can reuse it as an off-the-shelf tool to
efficiently and effectively encode proteins of different sizes and functions
for large-scale PPI prediction. Extensive experiments show that MAPE-PPI can
scale to PPI prediction with millions of PPIs with superior trade-offs between
effectiveness and computational efficiency than the state-of-the-art
competitors.
Related papers
- MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome.
Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs.
We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv Detail & Related papers (2024-11-04T07:14:28Z) - Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Effective Protein-Protein Interaction Exploration with PPIretrieval [46.07027715907749]
We propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration.
PPIretrieval searches for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces.
arXiv Detail & Related papers (2024-02-06T03:57:06Z) - Multimodal Pre-Training Model for Sequence-based Prediction of
Protein-Protein Interaction [7.022012579173686]
Pre-training a protein model to learn effective representation is critical for protein-protein interactions.
Most pre-training models for PPIs are sequence-based, which naively adopt the language models used in natural language processing to amino acid sequences.
We propose a multimodal protein pre-training model with three modalities: sequence, structure, and function.
arXiv Detail & Related papers (2021-12-09T10:21:52Z) - DIPS-Plus: The Enhanced Database of Interacting Protein Structures for
Interface Prediction [2.697420611471228]
We present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for geometric deep learning of protein interfaces.
The previous version of DIPS contains only the Cartesian coordinates and types of the atoms comprising a given protein complex.
DIPS-Plus now includes a plethora of new residue-level features including protrusion indices, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid.
arXiv Detail & Related papers (2021-06-06T23:56:27Z) - Assigning function to protein-protein interactions: a weakly supervised
BioBERT based approach using PubMed abstracts [2.208694022993555]
Protein-protein interactions (PPI) are critical to the function of proteins in both normal and diseased cells.
Only a small percentage of PPIs captured in protein interaction databases have annotations of function available.
Here, we aim to label the function type of PPIs by extracting relationships described in PubMed abstracts.
arXiv Detail & Related papers (2020-08-20T01:42:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.