Related papers: Improving Protein Optimization with Smoothed Fitness Landscapes

Improving Protein Optimization with Smoothed Fitness Landscapes

URL: http://arxiv.org/abs/2307.00494v3
Date: Sun, 3 Mar 2024 00:32:07 GMT
Title: Improving Protein Optimization with Smoothed Fitness Landscapes
Authors: Andrew Kirjner, Jason Yim, Raman Samusevich, Shahar Bracha, Tommi Jaakkola, Regina Barzilay, Ila Fiete
Abstract summary: We propose smoothing the fitness landscape to facilitate protein optimization. We find optimizing in this smoothed landscape leads to improved performance across multiple methods. Our method, called Gibbs sampling with Graph-based Smoothing (GGS), demonstrates a unique ability to achieve 2.5 fold fitness improvement.
Score: 27.30455141469762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to engineer novel proteins with higher fitness for a desired property would be revolutionary for biotechnology and medicine. Modeling the combinatorially large space of sequences is infeasible; prior methods often constrain optimization to a small mutational radius, but this drastically limits the design space. Instead of heuristics, we propose smoothing the fitness landscape to facilitate protein optimization. First, we formulate protein fitness as a graph signal then use Tikunov regularization to smooth the fitness landscape. We find optimizing in this smoothed landscape leads to improved performance across multiple methods in the GFP and AAV benchmarks. Second, we achieve state-of-the-art results utilizing discrete energy-based models and MCMC in the smoothed landscape. Our method, called Gibbs sampling with Graph-based Smoothing (GGS), demonstrates a unique ability to achieve 2.5 fold fitness improvement (with in-silico evaluation) over its training set. GGS demonstrates potential to optimize proteins in the limited data regime. Code: https://github.com/kirjner/GGS

Related papers

Steering Generative Models with Experimental Data for Protein Fitness Optimization [22.131533900376457]
Protein fitness optimization involves finding a sequence that maximizes desired quantitative properties in a large design space of possible sequences.<n>Recent developments in steering protein generative models (e.g. diffusion models, language models) offer a promising approach.<n>We show that plug-and-play guidance strategies offer advantages compared to alternatives such as reinforcement learning with protein language models.
arXiv Detail & Related papers (2025-05-21T04:30:48Z)
Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z)
A Variational Perspective on Generative Protein Fitness Optimization [14.726139539370307]
We introduce Variational Latent Generative Protein Optimization (VLGPO), a variational perspective on fitness optimization. Our method embeds protein sequences in a continuous latent space to enable efficient sampling from the fitness distribution. VLGPO achieves state-of-the-art results on two different protein benchmarks of varying complexity.
arXiv Detail & Related papers (2025-01-31T15:07:26Z)
Geometric Algebra Planes: Convex Implicit Neural Volumes [70.12234371845445]
We show that GA-Planes is equivalent to a sparse low-rank factor plus low-resolution matrix. We also show that GA-Planes can be adapted for many existing representations.
arXiv Detail & Related papers (2024-11-20T18:21:58Z)
Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space [13.228932754390748]
We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model. To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space. Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios.
arXiv Detail & Related papers (2024-05-29T11:03:42Z)
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting [48.59338619051709]
HiFi4G is an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage. It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame.
arXiv Detail & Related papers (2023-12-06T12:36:53Z)
Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data. Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z)
Flexible Isosurface Extraction for Gradient-Based Mesh Optimization [65.76362454554754]
This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field. We introduce FlexiCubes, an isosurface representation specifically designed for optimizing an unknown mesh with respect to geometric, visual, or even physical objectives.
arXiv Detail & Related papers (2023-08-10T06:40:19Z)
Improving few-shot learning-based protein engineering with evolutionary sampling [0.0]
We propose a few-shot learning approach to novel protein design that aims to accelerate the expensive wet lab testing cycle. Our approach is composed of two parts: a semi-supervised transfer learning approach to generate a discrete fitness landscape for a desired protein function and a novel evolutionary Monte Carlo Chain sampling algorithm. We demonstrate the performance of our approach by experimentally screening predicted high fitness gene activators, resulting in a dramatically improved hit rate compared to existing methods.
arXiv Detail & Related papers (2023-05-23T23:07:53Z)
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives. We show the advantages of ZO sign-based gradient descent (ZO-signGD) We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z)
Faster Optimization on Sparse Graphs via Neural Reparametrization [15.275428333269453]
We show that a graph neural network can implement an efficient Quasi-Newton method that can speed up optimization by a factor of 10-100x. We show the application of our method on scientifically relevant problems including heat diffusion, synchronization and persistent homology.
arXiv Detail & Related papers (2022-05-26T20:52:18Z)
EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network. Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
Fast differentiable DNA and protein sequence optimization for molecular design [0.0]
Machine learning models that accurately predict biological fitness from sequence are becoming a powerful tool for molecular design. Here, we build on a previously proposed straight-through approximation method to optimize through discrete sequence samples. The resulting algorithm, which we call Fast SeqPropProp, achieves up to 100-fold faster convergence compared to previous versions.
arXiv Detail & Related papers (2020-05-22T17:03:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.