Improving Protein Optimization with Smoothed Fitness Landscapes
- URL: http://arxiv.org/abs/2307.00494v3
- Date: Sun, 3 Mar 2024 00:32:07 GMT
- Title: Improving Protein Optimization with Smoothed Fitness Landscapes
- Authors: Andrew Kirjner, Jason Yim, Raman Samusevich, Shahar Bracha, Tommi
Jaakkola, Regina Barzilay, Ila Fiete
- Abstract summary: We propose smoothing the fitness landscape to facilitate protein optimization.
We find optimizing in this smoothed landscape leads to improved performance across multiple methods.
Our method, called Gibbs sampling with Graph-based Smoothing (GGS), demonstrates a unique ability to achieve 2.5 fold fitness improvement.
- Score: 27.30455141469762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to engineer novel proteins with higher fitness for a desired
property would be revolutionary for biotechnology and medicine. Modeling the
combinatorially large space of sequences is infeasible; prior methods often
constrain optimization to a small mutational radius, but this drastically
limits the design space. Instead of heuristics, we propose smoothing the
fitness landscape to facilitate protein optimization. First, we formulate
protein fitness as a graph signal then use Tikunov regularization to smooth the
fitness landscape. We find optimizing in this smoothed landscape leads to
improved performance across multiple methods in the GFP and AAV benchmarks.
Second, we achieve state-of-the-art results utilizing discrete energy-based
models and MCMC in the smoothed landscape. Our method, called Gibbs sampling
with Graph-based Smoothing (GGS), demonstrates a unique ability to achieve 2.5
fold fitness improvement (with in-silico evaluation) over its training set. GGS
demonstrates potential to optimize proteins in the limited data regime. Code:
https://github.com/kirjner/GGS
Related papers
- A Variational Perspective on Generative Protein Fitness Optimization [14.726139539370307]
We introduce Variational Latent Generative Protein Optimization (VLGPO), a variational perspective on fitness optimization.
Our method embeds protein sequences in a continuous latent space to enable efficient sampling from the fitness distribution.
VLGPO achieves state-of-the-art results on two different protein benchmarks of varying complexity.
arXiv Detail & Related papers (2025-01-31T15:07:26Z) - Geometric Algebra Planes: Convex Implicit Neural Volumes [70.12234371845445]
We show that GA-Planes is equivalent to a sparse low-rank factor plus low-resolution matrix.
We also show that GA-Planes can be adapted for many existing representations.
arXiv Detail & Related papers (2024-11-20T18:21:58Z) - Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space [13.228932754390748]
We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model.
To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space.
Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios.
arXiv Detail & Related papers (2024-05-29T11:03:42Z) - HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian
Splatting [48.59338619051709]
HiFi4G is an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage.
It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame.
arXiv Detail & Related papers (2023-12-06T12:36:53Z) - Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data.
Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z) - Flexible Isosurface Extraction for Gradient-Based Mesh Optimization [65.76362454554754]
This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field.
We introduce FlexiCubes, an isosurface representation specifically designed for optimizing an unknown mesh with respect to geometric, visual, or even physical objectives.
arXiv Detail & Related papers (2023-08-10T06:40:19Z) - Improving few-shot learning-based protein engineering with evolutionary
sampling [0.0]
We propose a few-shot learning approach to novel protein design that aims to accelerate the expensive wet lab testing cycle.
Our approach is composed of two parts: a semi-supervised transfer learning approach to generate a discrete fitness landscape for a desired protein function and a novel evolutionary Monte Carlo Chain sampling algorithm.
We demonstrate the performance of our approach by experimentally screening predicted high fitness gene activators, resulting in a dramatically improved hit rate compared to existing methods.
arXiv Detail & Related papers (2023-05-23T23:07:53Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Fast differentiable DNA and protein sequence optimization for molecular
design [0.0]
Machine learning models that accurately predict biological fitness from sequence are becoming a powerful tool for molecular design.
Here, we build on a previously proposed straight-through approximation method to optimize through discrete sequence samples.
The resulting algorithm, which we call Fast SeqPropProp, achieves up to 100-fold faster convergence compared to previous versions.
arXiv Detail & Related papers (2020-05-22T17:03:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.