Accurate and Definite Mutational Effect Prediction with Lightweight
Equivariant Graph Neural Networks
- URL: http://arxiv.org/abs/2304.08299v1
- Date: Thu, 13 Apr 2023 09:51:49 GMT
- Title: Accurate and Definite Mutational Effect Prediction with Lightweight
Equivariant Graph Neural Networks
- Authors: Bingxin Zhou, Outongyi Lv, Kai Yi, Xinye Xiong, Pan Tan, Liang Hong,
Yu Guang Wang
- Abstract summary: This research introduces a lightweight graph representation learning scheme that efficiently analyzes the microenvironment of wild-type proteins.
Our solution offers a wide range of benefits that make it an ideal choice for the community.
- Score: 2.381587712372268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Directed evolution as a widely-used engineering strategy faces obstacles in
finding desired mutants from the massive size of candidate modifications. While
deep learning methods learn protein contexts to establish feasible searching
space, many existing models are computationally demanding and fail to predict
how specific mutational tests will affect a protein's sequence or function.
This research introduces a lightweight graph representation learning scheme
that efficiently analyzes the microenvironment of wild-type proteins and
recommends practical higher-order mutations exclusive to the user-specified
protein and function of interest. Our method enables continuous improvement of
the inference model by limited computational resources and a few hundred
mutational training samples, resulting in accurate prediction of variant
effects that exhibit near-perfect correlation with the ground truth across deep
mutational scanning assays of 19 proteins. With its affordability and
applicability to both computer scientists and biochemical laboratories, our
solution offers a wide range of benefits that make it an ideal choice for the
community.
Related papers
- Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Phylogeny-informed fitness estimation [58.720142291102135]
We propose phylogeny-informed fitness estimation, which exploits a population's phylogeny to estimate fitness evaluations.
Our results indicate that phylogeny-informed fitness estimation can mitigate the drawbacks of down-sampled lexicase.
This work serves as an initial step toward improving evolutionary algorithms by exploiting runtime phylogenetic analysis.
arXiv Detail & Related papers (2023-06-06T19:05:01Z) - Reprogramming Pretrained Language Models for Protein Sequence
Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework.
R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences.
Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z) - SESNet: sequence-structure feature-integrated deep learning method for
data-efficient protein engineering [6.216757583450049]
We develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants.
We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship.
Our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants.
arXiv Detail & Related papers (2022-12-29T01:49:52Z) - Plug & Play Directed Evolution of Proteins with Gradient-based Discrete
MCMC [1.0499611180329804]
A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations.
We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models.
By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins.
arXiv Detail & Related papers (2022-12-20T00:26:23Z) - Protein language model rescue mutations highlight variant effects and
structure in clinically relevant genes [1.7970523486905976]
We interrogate the use of protein language models in characterizing known pathogenic mutations in curated, medically actionable genes.
Systematic analysis of the predicted effects of these compensatory mutations reveal unappreciated structural features of proteins.
We encourage the community to generate and curate rescue mutation experiments to inform the design of more sophisticated co-masking strategies.
arXiv Detail & Related papers (2022-11-18T03:00:52Z) - ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution [18.726398852721204]
We propose an efficient, experimental design-oriented closed-loop optimization framework for protein directed evolution.
ODBO employs a combination of novel low-dimensional protein encoding strategy and Bayesian optimization enhanced with search space prescreening via outlier detection.
We conduct and report four protein directed evolution experiments that substantiate the capability of the proposed framework for finding variants with properties of interest.
arXiv Detail & Related papers (2022-05-19T13:21:31Z) - Using Genetic Programming to Predict and Optimize Protein Function [65.25258357832584]
We propose POET, a computational Genetic Programming tool based on evolutionary methods to enhance screening and mutagenesis in Directed Evolution.
As a proof-of-concept we use peptides that generate MRI contrast detected by the Chemical Exchange Saturation Transfer mechanism.
Our results indicate that a computational modelling tool like POET can help to find peptides with 400% better functionality than used before.
arXiv Detail & Related papers (2022-02-08T18:08:08Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.