Deep learning-guided evolutionary optimization for protein design
- URL: http://arxiv.org/abs/2603.02753v1
- Date: Tue, 03 Mar 2026 08:53:12 GMT
- Title: Deep learning-guided evolutionary optimization for protein design
- Authors: Erik Hartman, Di Tang, Johan Malmström,
- Abstract summary: We present BoGA, a framework that combines evolutionary search with Bayesian optimization to efficiently navigate the sequence space.<n>By integrating a genetic algorithm as a proposal generator within a surrogate modeling loop, BoGA prioritizes candidates based on prior evaluations and surrogate model predictions.<n>We demonstrate the utility of BoGA through benchmarking on sequence and structure design tasks, followed by its application in designing peptide binders against pneumolysin.
- Score: 1.9779444191427633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing novel proteins with desired characteristics remains a significant challenge due to the large sequence space and the complexity of sequence-function relationships. Efficient exploration of this space to identify sequences that meet specific design criteria is crucial for advancing therapeutics and biotechnology. Here, we present BoGA (Bayesian Optimization Genetic Algorithm), a framework that combines evolutionary search with Bayesian optimization to efficiently navigate the sequence space. By integrating a genetic algorithm as a stochastic proposal generator within a surrogate modeling loop, BoGA prioritizes candidates based on prior evaluations and surrogate model predictions, enabling data-efficient optimization. We demonstrate the utility of BoGA through benchmarking on sequence and structure design tasks, followed by its application in designing peptide binders against pneumolysin, a key virulence factor of \textit{Streptococcus pneumoniae}. BoGA accelerates the discovery of high-confidence binders, demonstrating the potential for efficient protein design across diverse objectives. The algorithm is implemented within the BoPep suite and is available under an MIT license at \href{https://github.com/ErikHartman/bopep}{GitHub}.
Related papers
- Accelerating De Novo Genome Assembly via Quantum-Assisted Graph Optimization with Bitstring Recovery [0.0]
Genome sequencing is essential to decode genetic information, identify organisms, understand diseases and advance personalized medicine.<n>De novo genome assembly presents significant challenges due to its high computational complexity, affecting both time and accuracy.<n>We propose a hybrid approach utilizing a quantum computing-based optimization algorithm integrated with classical pre-processing to expedite the genome assembly process.
arXiv Detail & Related papers (2026-01-29T19:03:55Z) - Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics [16.336540408998598]
HADES is a Bayesian optimization method utilizing Hamiltonian dynamics to efficiently sample from a structure-aware approximated posterior.<n>A position discretization procedure is introduced to propose discrete protein sequences from such a continuous state system.<n>Experiments demonstrate that our method outperforms state-of-the-art baselines in in-silico evaluations.
arXiv Detail & Related papers (2026-01-16T05:53:53Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Designing Biological Sequences via Meta-Reinforcement Learning and
Bayesian Optimization [68.28697120944116]
We train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection.
We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data.
Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results.
arXiv Detail & Related papers (2022-09-13T18:37:27Z) - ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution [18.726398852721204]
We propose an efficient, experimental design-oriented closed-loop optimization framework for protein directed evolution.
ODBO employs a combination of novel low-dimensional protein encoding strategy and Bayesian optimization enhanced with search space prescreening via outlier detection.
We conduct and report four protein directed evolution experiments that substantiate the capability of the proposed framework for finding variants with properties of interest.
arXiv Detail & Related papers (2022-05-19T13:21:31Z) - Accelerating Bayesian Optimization for Biological Sequence Design with
Denoising Autoencoders [28.550684606186884]
We develop a new approach which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head.
We evaluate LaMBO on a small-molecule based on the ZINC dataset and introduce a new large-molecule task targeting fluorescent proteins.
arXiv Detail & Related papers (2022-03-23T21:58:45Z) - Improving RNA Secondary Structure Design using Deep Reinforcement
Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure.
We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - AdaLead: A simple and robust adaptive greedy search algorithm for
sequence design [55.41644538483948]
We develop an easy-to-directed, scalable, and robust evolutionary greedy algorithm (AdaLead)
AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.
arXiv Detail & Related papers (2020-10-05T16:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.