Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning
- URL: http://arxiv.org/abs/2405.10348v1
- Date: Thu, 16 May 2024 03:53:21 GMT
- Title: Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning
- Authors: Lirong Wu, Yijun Tian, Haitao Lin, Yufei Huang, Siyuan Li, Nitesh V Chawla, Stan Z. Li,
- Abstract summary: We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
- Score: 78.38442423223832
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial. To tackle the scarcity of annotated mutation data, pre-training with massive unlabeled data has emerged as a promising solution. However, this process faces a series of challenges: (1) complex higher-order dependencies among multiple (more than paired) structural scales have not yet been fully captured; (2) it is rarely explored how mutations alter the local conformation of the surrounding microenvironment; (3) pre-training is costly, both in data size and computational burden. In this paper, we first construct a hierarchical prompt codebook to record common microenvironmental patterns at different structural scales independently. Then, we develop a novel codebook pre-training task, namely masked microenvironment modeling, to model the joint distribution of each mutation with their residue types, angular statistics, and local conformational changes in the microenvironment. With the constructed prompt codebook, we encode the microenvironment around each mutation into multiple hierarchical prompts and combine them to flexibly provide information to wild-type and mutated protein complexes about their microenvironmental differences. Such a hierarchical prompt learning framework has demonstrated superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction and a case study of optimizing human antibodies against SARS-CoV-2.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - MutaPLM: Protein Language Modeling for Mutation Explanation and Engineering [12.738902517872509]
MutaPLM is a unified framework for interpreting and navigating protein mutations with protein language models.
MutaPLM introduces a protein delta network that captures explicit protein mutation representations within a unified feature space.
MutaPLM excels at providing human-understandable explanations for mutational effects and prioritizing novel mutations with desirable properties.
arXiv Detail & Related papers (2024-10-30T12:05:51Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.
Deep generative models have shown promise in generating protein conformations as a more efficient alternative.
We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction
Prediction via Microenvironment-Aware Protein Embedding [82.31506767274841]
Protein-Protein Interactions (PPIs) are fundamental in various biological processes and play a key role in life activities.
MPAE-PPI encodes microenvironments into chemically meaningful discrete codes via a sufficiently large microenvironment "vocabulary"
MPAE-PPI can scale to PPI prediction with millions of PPIs with superior trade-offs between effectiveness and computational efficiency.
arXiv Detail & Related papers (2024-02-22T09:04:41Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Fast and Functional Structured Data Generators Rooted in
Out-of-Equilibrium Physics [62.997667081978825]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - Multi-level Protein Representation Learning for Blind Mutational Effect
Prediction [5.207307163958806]
This paper introduces a novel pre-training framework that cascades sequential and geometric analyzers for protein structures.
It guides mutational directions toward desired traits by simulating natural selection on wild-type proteins.
We assess the proposed approach using a public database and two new databases for a variety of variant effect prediction tasks.
arXiv Detail & Related papers (2023-06-08T03:00:50Z) - Accurate and Definite Mutational Effect Prediction with Lightweight
Equivariant Graph Neural Networks [2.381587712372268]
This research introduces a lightweight graph representation learning scheme that efficiently analyzes the microenvironment of wild-type proteins.
Our solution offers a wide range of benefits that make it an ideal choice for the community.
arXiv Detail & Related papers (2023-04-13T09:51:49Z) - SESNet: sequence-structure feature-integrated deep learning method for
data-efficient protein engineering [6.216757583450049]
We develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants.
We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship.
Our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants.
arXiv Detail & Related papers (2022-12-29T01:49:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.