Related papers: Biological Sequence Design with GFlowNets

Biological Sequence Design with GFlowNets

URL: http://arxiv.org/abs/2203.04115v3
Date: Wed, 24 May 2023 11:21:54 GMT
Title: Biological Sequence Design with GFlowNets
Authors: Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio
Abstract summary: Design of de novo biological sequences with desired properties often involves an active loop with several rounds of molecule ideation and expensive wet-lab evaluations. This makes the diversity of proposed candidates a key consideration in the ideation phase. We propose an active learning algorithm leveraging uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions.
Score: 75.1642973538266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Design of de novo biological sequences with desired properties, like protein and DNA sequences, often involves an active loop with several rounds of molecule ideation and expensive wet-lab evaluations. These experiments can consist of multiple stages, with increasing levels of precision and cost of evaluation, where candidates are filtered. This makes the diversity of proposed candidates a key consideration in the ideation phase. In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round. We also propose a scheme to incorporate existing labeled datasets of candidates, in addition to a reward function, to speed up learning in GFlowNets. We present empirical results on several biological sequence design tasks, and we find that our method generates more diverse and novel batches with high scoring candidates compared to existing approaches.

Related papers

Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection [37.54564513506548]
Generative Language Models rely on autoregressive decoding to produce the output sequence token by token. We introduce an evaluation of a comprehensive collection of decoding-free candidate selection approaches on a comprehensive set of tasks.
arXiv Detail & Related papers (2025-01-28T23:21:28Z)
Efficient Biological Data Acquisition through Inference Set Design [3.9633147697178996]
In this work, we aim to select the smallest set of candidates in order to achieve some desired level of accuracy for the system as a whole. We call this mechanism inference set design, and propose the use of a confidence-based active learning solution to prune out challenging examples.
arXiv Detail & Related papers (2024-10-25T15:34:03Z)
Sample-efficient Multi-objective Molecular Optimization with GFlowNets [5.030493242666028]
We propose a multi-objective Bayesian optimization (MOBO) algorithm leveraging the hypernetwork-based GFlowNets (HN-GFN) Using a single preference-conditioned hypernetwork, HN-GFN learns to explore various trade-offs between objectives. Experiments in various real-world settings demonstrate that our framework predominantly outperforms existing methods in terms of candidate quality and sample efficiency.
arXiv Detail & Related papers (2023-02-08T13:30:28Z)
Multi-Objective GFlowNets [59.16787189214784]
We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse optimal solutions, based on GFlowNets.
arXiv Detail & Related papers (2022-10-23T16:15:36Z)
Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning [54.247560894146105]
Inverse design of short single-stranded RNA and DNA sequences (aptamers) is the task of finding sequences that satisfy a set of desired criteria. We propose to use an unsupervised machine learning model known as the Potts model to discover new, useful sequences with controllable sequence diversity.
arXiv Detail & Related papers (2022-08-10T13:30:58Z)
Exploiting Diversity of Unlabeled Data for Label-Efficient Semi-Supervised Active Learning [57.436224561482966]
Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling. We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting. Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
arXiv Detail & Related papers (2022-07-25T16:11:55Z)
Active Exploration via Experiment Design in Markov Chains [86.41407938210193]
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. We propose an algorithm that efficiently selects policies whose measurement allocation converges to the optimal one. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.
arXiv Detail & Related papers (2022-06-29T00:04:40Z)
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation [110.09855163856326]
This paper is about the problem of learning a policy for generating an object from a sequence of actions. We propose GFlowNet, based on a view of the generative process as a flow network. We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution.
arXiv Detail & Related papers (2021-06-08T14:21:10Z)
Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round. Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.