Related papers: Towards DNA-Encoded Library Generation with GFlowNets

Towards DNA-Encoded Library Generation with GFlowNets

URL: http://arxiv.org/abs/2404.10094v1
Date: Mon, 15 Apr 2024 19:01:20 GMT
Title: Towards DNA-Encoded Library Generation with GFlowNets
Authors: Michał Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio,
Abstract summary: One of the key challenges in using DELs is library design. In this paper we consider the task of protein-protein interaction (PPI) biased DEL. We evaluate several machine learning algorithms on the modulation task and use them as a reward for the proposed GFlowNet-based generative approach.
Score: 35.09890349911668
License: http://creativecommons.org/licenses/by/4.0/
Abstract: DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.

Related papers

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [57.09163579304332]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z)
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process. Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z)
EpiCoder: Encompassing Diversity and Complexity in Code Generation [49.170195362149386]
We introduce a novel feature tree-based synthesis framework inspired by Abstract Syntax Trees (AST) Unlike AST, which captures syntactic structure of code, our framework models semantic relationships between code elements. We fine-tuned widely-used base models to create the EpiCoder series, achieving state-of-the-art performance at both the function and file levels.
arXiv Detail & Related papers (2025-01-08T18:58:15Z)
Improving GFlowNets with Monte Carlo Tree Search [6.497027864860203]
Recent studies have revealed strong connections between GFlowNets and entropy-regularized reinforcement learning. We propose to enhance planning capabilities of GFlowNets by applying Monte Carlo Tree Search (MCTS) Our experiments demonstrate that this approach improves the sample efficiency of GFlowNet training and the generation fidelity of pre-trained GFlowNet models.
arXiv Detail & Related papers (2024-06-19T15:58:35Z)
RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions. RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z)
Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets [86.43523688236077]
Combinatorial optimization (CO) problems are often NP-hard and out of reach for exact algorithms. GFlowNets have emerged as a powerful machinery to efficiently sample from composite unnormalized densities sequentially. In this paper, we design Markov decision processes (MDPs) for different problems and propose to train conditional GFlowNets to sample from the solution space.
arXiv Detail & Related papers (2023-05-26T15:13:09Z)
torchgfn: A PyTorch GFlowNet library [56.071033896777784]
torchgfn is a PyTorch library that aims to address this need. It provides users with a simple API for environments and useful abstractions for samplers and losses.
arXiv Detail & Related papers (2023-05-24T00:20:59Z)
An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries [1.5495593104596397]
Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. We propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE) to overcome these challenges.
arXiv Detail & Related papers (2022-10-19T15:43:13Z)
GFlowCausal: Generative Flow Networks for Causal Discovery [27.51595081346858]
We propose a novel approach to learning a Directed Acyclic Graph (DAG) from observational data called GFlowCausal. GFlowCausal aims to learn the best policy to generate high-reward DAGs by sequential actions with probabilities proportional to predefined rewards. We conduct extensive experiments on both synthetic and real datasets, and results show the proposed approach to be superior and also performs well in a large-scale setting.
arXiv Detail & Related papers (2022-10-15T04:07:39Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
GPflux: A Library for Deep Gaussian Processes [31.207566616050574]
GPflux is a Python library for Bayesian deep learning with a strong emphasis on deep Gaussian processes (DGPs) It is compatible with and built on top of the Keras deep learning eco-system. GPflux relies on GPflow for most of its GP objects and operations, which makes it an efficient, modular and extendable library.
arXiv Detail & Related papers (2021-04-12T17:41:18Z)
Torch-Struct: Deep Structured Prediction Library [138.5262350501951]
We introduce Torch-Struct, a library for structured prediction. Torch-Struct includes a broad collection of probabilistic structures accessed through a simple and flexible distribution-based API.
arXiv Detail & Related papers (2020-02-03T16:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.