Designing Biological Sequences via Meta-Reinforcement Learning and
Bayesian Optimization
- URL: http://arxiv.org/abs/2209.06259v1
- Date: Tue, 13 Sep 2022 18:37:27 GMT
- Title: Designing Biological Sequences via Meta-Reinforcement Learning and
Bayesian Optimization
- Authors: Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon
- Abstract summary: We train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection.
We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data.
Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results.
- Score: 68.28697120944116
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The ability to accelerate the design of biological sequences can have a
substantial impact on the progress of the medical field. The problem can be
framed as a global optimization problem where the objective is an expensive
black-box function such that we can query large batches restricted with a
limitation of a low number of rounds. Bayesian Optimization is a principled
method for tackling this problem. However, the astronomically large state space
of biological sequences renders brute-force iterating over all possible
sequences infeasible. In this paper, we propose MetaRLBO where we train an
autoregressive generative model via Meta-Reinforcement Learning to propose
promising sequences for selection via Bayesian Optimization. We pose this
problem as that of finding an optimal policy over a distribution of MDPs
induced by sampling subsets of the data acquired in the previous rounds. Our
in-silico experiments show that meta-learning over such ensembles provides
robustness against reward misspecification and achieves competitive results
compared to existing strong baselines.
Related papers
- Batched Bayesian optimization with correlated candidate uncertainties [44.38372821900645]
We propose an acquisition strategy for discrete optimization motivated by pure exploitation, qPO (multipoint of Optimality)
We apply our method to the model-guided exploration of large chemical libraries and provide empirical evidence that it performs better than or on par with state-of-the-art methods in batched Bayesian optimization.
arXiv Detail & Related papers (2024-10-08T20:13:12Z) - Reinforcement Learning for Sequence Design Leveraging Protein Language Models [14.477268882311991]
We propose to use protein language models (PLMs) as a reward function to generate new sequences.
We perform extensive experiments on various sequence lengths to benchmark RL-based approaches.
We provide comprehensive evaluations along biological plausibility and diversity of the protein.
arXiv Detail & Related papers (2024-07-03T14:31:36Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - A Study of Bayesian Neural Network Surrogates for Bayesian Optimization [46.97686790714025]
Bayesian neural networks (BNNs) have recently become practical function approximators.
We study BNNs as alternatives to standard GP surrogates for optimization.
arXiv Detail & Related papers (2023-05-31T17:00:00Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Accelerating Bayesian Optimization for Biological Sequence Design with
Denoising Autoencoders [28.550684606186884]
We develop a new approach which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head.
We evaluate LaMBO on a small-molecule based on the ZINC dataset and introduce a new large-molecule task targeting fluorescent proteins.
arXiv Detail & Related papers (2022-03-23T21:58:45Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - AdaLead: A simple and robust adaptive greedy search algorithm for
sequence design [55.41644538483948]
We develop an easy-to-directed, scalable, and robust evolutionary greedy algorithm (AdaLead)
AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.
arXiv Detail & Related papers (2020-10-05T16:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.