Mathematics-assisted directed evolution and protein engineering
- URL: http://arxiv.org/abs/2306.04658v1
- Date: Tue, 6 Jun 2023 19:27:11 GMT
- Title: Mathematics-assisted directed evolution and protein engineering
- Authors: Yuchi Qiu, Guo-Wei Wei
- Abstract summary: It is experimentally impossible to perform the deep mutational scanning of the entire protein library due to the enormous mutational space.
This has led to the rapid growth of AI-assisted directed evolution (AIDE) or AI-assisted protein engineering (AIPE) as an emerging research field.
We argue that a class of persistent topological Laplacians (PTLs), including persistent Laplacians, persistent path Laplacians, persistent sheaf Laplacians, persistent hypergraph Laplacians, persistent hyperdigraph Laplacians, and evolutionary de Rham-Hodge theory, can effectively overcome the limitations
- Score: 0.913755431537592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Directed evolution is a molecular biology technique that is transforming
protein engineering by creating proteins with desirable properties and
functions. However, it is experimentally impossible to perform the deep
mutational scanning of the entire protein library due to the enormous
mutational space, which scales as $20^N$ , where N is the number of amino
acids. This has led to the rapid growth of AI-assisted directed evolution
(AIDE) or AI-assisted protein engineering (AIPE) as an emerging research field.
Aided with advanced natural language processing (NLP) techniques, including
long short-term memory, autoencoder, and transformer, sequence-based embeddings
have been dominant approaches in AIDE and AIPE. Persistent Laplacians, an
emerging technique in topological data analysis (TDA), have made
structure-based embeddings a superb option in AIDE and AIPE. We argue that a
class of persistent topological Laplacians (PTLs), including persistent
Laplacians, persistent path Laplacians, persistent sheaf Laplacians, persistent
hypergraph Laplacians, persistent hyperdigraph Laplacians, and evolutionary de
Rham-Hodge theory, can effectively overcome the limitations of the current TDA
and offer a new generation of more powerful TDA approaches. In the general
framework of topological deep learning, mathematics-assisted directed evolution
(MADE) has a great potential for future protein engineering.
Related papers
- MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training [48.398329286769304]
Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families.
MSAGPT is a novel approach to prompt protein structure predictions via MSA generative pretraining in the low MSA regime.
arXiv Detail & Related papers (2024-06-08T04:23:57Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - Efficiently Predicting Mutational Effect on Homologous Proteins by Evolution Encoding [7.067145619709089]
EvolMPNN is an efficient model to learn evolution-aware protein embeddings.
Our model shows up to 6.4% better than state-of-the-art methods and attains 36X inference speedup.
arXiv Detail & Related papers (2024-02-20T23:06:21Z) - Evolutionary Algorithms Simulating Molecular Evolution: A New Field Proposal [0.0716879432974126]
Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared to the massive search space of all possible amino acid sequences, the set of known functional families is minimal.
One could say nature has a limited protein "vocabulary"
By merging evolutionary algorithms, machine learning (ML), and bioinformatics, we can facilitate the development of completely novel proteins which have never existed before.
arXiv Detail & Related papers (2024-02-01T19:22:02Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - A Survey on Protein Representation Learning: Retrospect and Prospect [42.38007308086495]
Protein representation learning is a promising research topic for extracting informative knowledge from massive protein sequences or structures.
We introduce the motivations for protein representation learning and formulate it in a general and unified framework.
Next, we divide existing PRL methods into three main categories: sequence-based, structure-based, and sequence-structure co-modeling.
arXiv Detail & Related papers (2022-12-31T04:01:16Z) - Plug & Play Directed Evolution of Proteins with Gradient-based Discrete
MCMC [1.0499611180329804]
A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations.
We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models.
By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins.
arXiv Detail & Related papers (2022-12-20T00:26:23Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution [18.726398852721204]
We propose an efficient, experimental design-oriented closed-loop optimization framework for protein directed evolution.
ODBO employs a combination of novel low-dimensional protein encoding strategy and Bayesian optimization enhanced with search space prescreening via outlier detection.
We conduct and report four protein directed evolution experiments that substantiate the capability of the proposed framework for finding variants with properties of interest.
arXiv Detail & Related papers (2022-05-19T13:21:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.