Related papers: Fast differentiable DNA and protein sequence optimization for molecular design

Fast differentiable DNA and protein sequence optimization for molecular design

URL: http://arxiv.org/abs/2005.11275v2
Date: Sun, 20 Dec 2020 22:44:01 GMT
Title: Fast differentiable DNA and protein sequence optimization for molecular design
Authors: Johannes Linder and Georg Seelig
Abstract summary: Machine learning models that accurately predict biological fitness from sequence are becoming a powerful tool for molecular design. Here, we build on a previously proposed straight-through approximation method to optimize through discrete sequence samples. The resulting algorithm, which we call Fast SeqPropProp, achieves up to 100-fold faster convergence compared to previous versions.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Designing DNA and protein sequences with improved function has the potential to greatly accelerate synthetic biology. Machine learning models that accurately predict biological fitness from sequence are becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, this method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence. Here, we build on a previously proposed straight-through approximation method to optimize through discrete sequence samples. By normalizing nucleotide logits across positions and introducing an adaptive entropy variable, we remove bottlenecks arising from overly large or skewed sampling parameters. The resulting algorithm, which we call Fast SeqProp, achieves up to 100-fold faster convergence compared to previous versions of activation maximization and finds improved fitness optima for many applications. We demonstrate Fast SeqProp by designing DNA and protein sequences for six deep learning predictors, including a protein structure predictor.

Related papers

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z)
Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization [44.356888079704156]
Protein engineering is a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences. We propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model.
arXiv Detail & Related papers (2024-01-08T06:33:27Z)
Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z)
Protein Sequence Design with Batch Bayesian Optimisation [0.0]
Protein sequence design is a challenging problem in protein engineering, which aims to discover novel proteins with useful biological functions. directed evolution is a widely-used approach for protein sequence design, which mimics the evolution cycle in a laboratory environment and conducts an iterative protocol. We propose a new method based on Batch Bayesian Optimization (Batch BO), a well-established optimization method, for protein sequence design.
arXiv Detail & Related papers (2023-03-18T14:53:20Z)
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives. We show the advantages of ZO sign-based gradient descent (ZO-signGD) We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z)
Improving RNA Secondary Structure Design using Deep Reinforcement Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure. We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z)
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z)
Adaptive machine learning for protein engineering [0.4568777157687961]
We discuss how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine-learning optimization. Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.
arXiv Detail & Related papers (2021-06-10T02:56:35Z)
EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network. Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins [0.0]
Predicting the effect of mutations in proteins is one of the most critical challenges in protein engineering. We use clustering, embedding, and dimensionality reduction techniques to select combinations of physicochemical properties for the encoding stage. We then select the best performing predictive models in each set of properties and create an assembled model.
arXiv Detail & Related papers (2020-10-07T16:35:02Z)
AdaLead: A simple and robust adaptive greedy search algorithm for sequence design [55.41644538483948]
We develop an easy-to-directed, scalable, and robust evolutionary greedy algorithm (AdaLead) AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.
arXiv Detail & Related papers (2020-10-05T16:40:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.