Evolutionary Algorithms Simulating Molecular Evolution: A New Field Proposal
- URL: http://arxiv.org/abs/2403.08797v2
- Date: Mon, 10 Jun 2024 21:49:16 GMT
- Title: Evolutionary Algorithms Simulating Molecular Evolution: A New Field Proposal
- Authors: James S. L. Browning Jr., Daniel R. Tauritz, John Beckmann,
- Abstract summary: Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared to the massive search space of all possible amino acid sequences, the set of known functional families is minimal.
One could say nature has a limited protein "vocabulary"
By merging evolutionary algorithms, machine learning (ML), and bioinformatics, we can facilitate the development of completely novel proteins which have never existed before.
- Score: 0.0716879432974126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The genetic blueprint for the essential functions of life is encoded in DNA, which is translated into proteins -- the engines driving most of our metabolic processes. Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared to the massive search space of all possible amino acid sequences, the set of known functional families is minimal. One could say nature has a limited protein "vocabulary." The major question for computational biologists, therefore, is whether this vocabulary can be expanded to include useful proteins that went extinct long ago, or maybe never evolved in the first place. We outline a computational approach to solving this problem. By merging evolutionary algorithms, machine learning (ML), and bioinformatics, we can facilitate the development of completely novel proteins which have never existed before. We envision this work forming a new sub-field of computational evolution we dub evolutionary algorithms simulating molecular evolution (EASME).
Related papers
- Meta-Learning an Evolvable Developmental Encoding [7.479827648985631]
Generative models have shown promise in being learnable representations for black-box optimisation.
Here we present a system that can meta-learn such representation by optimising for a representation's ability to generate quality-diversity.
In more detail, we show our meta-learning approach can find one Neural Cellular Automata, in which cells can attend to different parts of a "DNA" string genome during development.
arXiv Detail & Related papers (2024-06-13T11:52:06Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - Efficiently Predicting Mutational Effect on Homologous Proteins by Evolution Encoding [7.067145619709089]
EvolMPNN is an efficient model to learn evolution-aware protein embeddings.
Our model shows up to 6.4% better than state-of-the-art methods and attains 36X inference speedup.
arXiv Detail & Related papers (2024-02-20T23:06:21Z) - Mathematics-assisted directed evolution and protein engineering [0.913755431537592]
It is experimentally impossible to perform the deep mutational scanning of the entire protein library due to the enormous mutational space.
This has led to the rapid growth of AI-assisted directed evolution (AIDE) or AI-assisted protein engineering (AIPE) as an emerging research field.
We argue that a class of persistent topological Laplacians (PTLs), including persistent Laplacians, persistent path Laplacians, persistent sheaf Laplacians, persistent hypergraph Laplacians, persistent hyperdigraph Laplacians, and evolutionary de Rham-Hodge theory, can effectively overcome the limitations
arXiv Detail & Related papers (2023-06-06T19:27:11Z) - Phylogeny-informed fitness estimation [58.720142291102135]
We propose phylogeny-informed fitness estimation, which exploits a population's phylogeny to estimate fitness evaluations.
Our results indicate that phylogeny-informed fitness estimation can mitigate the drawbacks of down-sampled lexicase.
This work serves as an initial step toward improving evolutionary algorithms by exploiting runtime phylogenetic analysis.
arXiv Detail & Related papers (2023-06-06T19:05:01Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Using Genetic Programming to Predict and Optimize Protein Function [65.25258357832584]
We propose POET, a computational Genetic Programming tool based on evolutionary methods to enhance screening and mutagenesis in Directed Evolution.
As a proof-of-concept we use peptides that generate MRI contrast detected by the Chemical Exchange Saturation Transfer mechanism.
Our results indicate that a computational modelling tool like POET can help to find peptides with 400% better functionality than used before.
arXiv Detail & Related papers (2022-02-08T18:08:08Z) - Modeling Protein Using Large-scale Pretrain Language Model [12.568452480689578]
Interdisciplinary researchers have begun to leverage deep learning methods to model large biological datasets.
Inspired by the similarity between natural language and protein sequences, we use large-scale language models to model evolutionary-scale protein sequences.
Our model can accurately capture evolution information from pretraining on evolutionary-scale individual sequences.
arXiv Detail & Related papers (2021-08-17T04:13:11Z) - Epigenetic opportunities for Evolutionary Computation [0.0]
Evolutionary Computation is a group of biologically inspired algorithms used to solve complex optimisation problems.
It can be split into Evolutionary Algorithms, which take inspiration from genetic inheritance, and Swarm Intelligence algorithms, that take inspiration from cultural inheritance.
This paper breaks down successful bio-inspired algorithms under a contemporary biological framework based on the Extended Evolutionary Synthesis.
arXiv Detail & Related papers (2021-08-10T09:44:53Z) - ProGen: Language Modeling for Protein Generation [47.32931317203297]
Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science.
We pose protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lack costly, structural annotations.
arXiv Detail & Related papers (2020-03-08T04:27:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.