Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
- URL: http://arxiv.org/abs/2401.06173v1
- Date: Mon, 8 Jan 2024 06:33:27 GMT
- Title: Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
- Authors: Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang,
Mengdi Wang
- Abstract summary: Protein engineering is a daunting task due to the vast sequence space of any given protein.
Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences.
We propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model.
- Score: 44.356888079704156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While modern biotechnologies allow synthesizing new proteins and function
measurements at scale, efficiently exploring a protein sequence space and
engineering it remains a daunting task due to the vast sequence space of any
given protein. Protein engineering is typically conducted through an iterative
process of adding mutations to the wild-type or lead sequences, recombination
of mutations, and running new rounds of screening. To enhance the efficiency of
such a process, we propose a tree search-based bandit learning method, which
expands a tree starting from the initial sequence with the guidance of a bandit
machine learning model. Under simplified assumptions and a Gaussian Process
prior, we provide theoretical analysis and a Bayesian regret bound,
demonstrating that the combination of local search and bandit learning method
can efficiently discover a near-optimal design. The full algorithm is
compatible with a suite of randomized tree search heuristics, machine learning
models, pre-trained embeddings, and bandit techniques. We test various
instances of the algorithm across benchmark protein datasets using simulated
screens. Experiment results demonstrate that the algorithm is both
sample-efficient and able to find top designs using reasonably small mutation
counts.
Related papers
- Protein Design by Integrating Machine Learning with Quantum Annealing and Quantum-inspired Optimization [0.0]
The protein design problem involves finding polypeptide sequences folding into a given threedimensional structure.
Recent machine learning breakthroughs have enabled accurate and rapid structure predictions.
We introduce a general protein design scheme where algorithmic and technological advancements in machine learning and quantum-inspired algorithms can be integrated.
arXiv Detail & Related papers (2024-07-09T18:42:45Z) - Reinforcement Learning for Sequence Design Leveraging Protein Language Models [14.477268882311991]
We propose to use protein language models (PLMs) as a reward function to generate new sequences.
We perform extensive experiments on various sequence lengths to benchmark RL-based approaches.
We provide comprehensive evaluations along biological plausibility and diversity of the protein.
arXiv Detail & Related papers (2024-07-03T14:31:36Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Protein Sequence Design with Batch Bayesian Optimisation [0.0]
Protein sequence design is a challenging problem in protein engineering, which aims to discover novel proteins with useful biological functions.
directed evolution is a widely-used approach for protein sequence design, which mimics the evolution cycle in a laboratory environment and conducts an iterative protocol.
We propose a new method based on Batch Bayesian Optimization (Batch BO), a well-established optimization method, for protein sequence design.
arXiv Detail & Related papers (2023-03-18T14:53:20Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Improving RNA Secondary Structure Design using Deep Reinforcement
Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure.
We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z) - Adaptive machine learning for protein engineering [0.4568777157687961]
We discuss how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement.
First, we discuss how to select sequences through a single round of machine-learning optimization.
Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.
arXiv Detail & Related papers (2021-06-10T02:56:35Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - AdaLead: A simple and robust adaptive greedy search algorithm for
sequence design [55.41644538483948]
We develop an easy-to-directed, scalable, and robust evolutionary greedy algorithm (AdaLead)
AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.
arXiv Detail & Related papers (2020-10-05T16:40:38Z) - Fast differentiable DNA and protein sequence optimization for molecular
design [0.0]
Machine learning models that accurately predict biological fitness from sequence are becoming a powerful tool for molecular design.
Here, we build on a previously proposed straight-through approximation method to optimize through discrete sequence samples.
The resulting algorithm, which we call Fast SeqPropProp, achieves up to 100-fold faster convergence compared to previous versions.
arXiv Detail & Related papers (2020-05-22T17:03:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.