Inverse Protein Folding Using Deep Bayesian Optimization
- URL: http://arxiv.org/abs/2305.18089v1
- Date: Thu, 25 May 2023 02:15:25 GMT
- Title: Inverse Protein Folding Using Deep Bayesian Optimization
- Authors: Natalie Maus and Yimeng Zeng and Daniel Allen Anderson and Phillip
Maffettone and Aaron Solomon and Peyton Greenside and Osbert Bastani and
Jacob R. Gardner
- Abstract summary: Inverse protein folding has surfaced as an important problem in the "top down", de novo design of proteins.
In this paper, we cast the problem of improving generated inverse folds as an optimization problem that we solve using recent advances in "deep" or "latent space"
Our approach consistently produces protein sequences with greatly reduced structural error to the target backbone structure as measured by TM score and RMSD.
- Score: 18.77797005929986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse protein folding -- the task of predicting a protein sequence from its
backbone atom coordinates -- has surfaced as an important problem in the "top
down", de novo design of proteins. Contemporary approaches have cast this
problem as a conditional generative modelling problem, where a large generative
model over protein sequences is conditioned on the backbone. While these
generative models very rapidly produce promising sequences, independent draws
from generative models may fail to produce sequences that reliably fold to the
correct backbone. Furthermore, it is challenging to adapt pure generative
approaches to other settings, e.g., when constraints exist. In this paper, we
cast the problem of improving generated inverse folds as an optimization
problem that we solve using recent advances in "deep" or "latent space"
Bayesian optimization. Our approach consistently produces protein sequences
with greatly reduced structural error to the target backbone structure as
measured by TM score and RMSD while using fewer computational resources.
Additionally, we demonstrate other advantages of an optimization-based approach
to the problem, such as the ability to handle constraints.
Related papers
- Non-negative Weighted DAG Structure Learning [12.139158398361868]
We address the problem of learning the true DAGs from nodal observations.
We propose a DAG recovery algorithm based on the method that is guaranteed to return ar.
arXiv Detail & Related papers (2024-09-12T09:41:29Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Symmetric Tensor Networks for Generative Modeling and Constrained
Combinatorial Optimization [72.41480594026815]
Constrained optimization problems abound in industry, from portfolio optimization to logistics.
One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space.
In this work, we encode arbitrary integer-valued equality constraints of the form Ax=b, directly into U(1) symmetric networks (TNs) and leverage their applicability as quantum-inspired generative models.
arXiv Detail & Related papers (2022-11-16T18:59:54Z) - Designing Biological Sequences via Meta-Reinforcement Learning and
Bayesian Optimization [68.28697120944116]
We train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection.
We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data.
Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results.
arXiv Detail & Related papers (2022-09-13T18:37:27Z) - Combining Genetic Programming and Particle Swarm Optimization to
Simplify Rugged Landscapes Exploration [7.25130576615102]
We propose a novel method for constructing a smooth surrogate model of the original function.
The proposed algorithm, called the GP-FST-PSO Surrogate Model, achieves satisfactory results in both the search for the global optimum and the production of a visual approximation of the original benchmark function.
arXiv Detail & Related papers (2022-06-07T12:55:04Z) - Generative power of a protein language model trained on multiple
sequence alignments [0.5639904484784126]
Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families.
Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end.
We propose and test an iterative method that directly uses the masked language modeling objective to generate sequences using MSA Transformer.
arXiv Detail & Related papers (2022-04-14T16:59:05Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Intermediate Layer Optimization for Inverse Problems using Deep
Generative Models [86.29330440222199]
ILO is a novel optimization algorithm for solving inverse problems with deep generative models.
We empirically show that our approach outperforms state-of-the-art methods introduced in StyleGAN-2 and PULSE for a wide range of inverse problems.
arXiv Detail & Related papers (2021-02-15T06:52:22Z) - AdaLead: A simple and robust adaptive greedy search algorithm for
sequence design [55.41644538483948]
We develop an easy-to-directed, scalable, and robust evolutionary greedy algorithm (AdaLead)
AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.
arXiv Detail & Related papers (2020-10-05T16:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.