Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation
- URL: http://arxiv.org/abs/2503.06337v2
- Date: Tue, 25 Mar 2025 19:56:33 GMT
- Title: Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation
- Authors: Mohit Pandey, Gopeshh Subbaraj, Artem Cherkasov, Martin Ester, Emmanuel Bengio,
- Abstract summary: Generative Flow Networks (GFlowNets) have recently emerged as a suitable framework for generating diverse and high-quality molecular structures.<n>In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks.<n>We propose an unsupervised pre-training approach using drug-like molecule datasets, which teaches A-GFNs about inexpensive yet informative molecular descriptors.
- Score: 6.495442425890008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Flow Networks (GFlowNets) have recently emerged as a suitable framework for generating diverse and high-quality molecular structures by learning from rewards treated as unnormalized distributions. Previous works in this framework often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using drug-like molecule datasets, which teaches A-GFNs about inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further implement a goal-conditioned finetuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on a subset of ZINC dataset, and by employing robust evaluation metrics we show the effectiveness of our approach when compared to other relevant baseline methods for a wide range of drug design tasks.
Related papers
- Learning Hierarchical Interaction for Accurate Molecular Property Prediction [8.488251667425887]
We propose a Hierarchical Interaction Message Passing Mechanism, which serves as the foundation of our novel model, HimNet.
Our method enables interaction-aware representation learning across atomic, motif, and molecular levels via hierarchical attention-guided message passing.
Our method exhibits promising hierarchical interpretability, aligning well with chemical intuition on representative molecules.
arXiv Detail & Related papers (2025-04-28T15:19:28Z) - DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.
This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z) - GFlowNet Pretraining with Inexpensive Rewards [2.924067540644439]
We introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively.
We propose an unsupervised pre-training approach using offline drug-like molecule datasets, which conditions A-GFNs on inexpensive yet informative molecular descriptors.
We further our method by implementing a goal-conditioned fine-tuning process, which adapts A-GFNs to optimize for specific target properties.
arXiv Detail & Related papers (2024-09-15T11:42:17Z) - Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
AliDiff is a novel framework to align pretrained target diffusion models with preferred functional properties.
It can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score.
arXiv Detail & Related papers (2024-07-01T06:10:29Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design [3.45184803671951]
TacoGFN is a novel GFlowNet-based approach for structure-based drug design.
It can generate molecules conditioned on any protein pocket structure with probabilities proportional to its affinity and property rewards.
In the generative setting for CrossDocked 2020 benchmark, TacoGFN attains a state-of-the-art success rate of $56.0%$ and $-8.44$ kcal/mol in median Vina Dock score.
arXiv Detail & Related papers (2023-10-05T00:45:04Z) - HiGNN: Hierarchical Informative Graph Neural Networks for Molecular
Property Prediction Equipped with Feature-Wise Attention [5.735627221409312]
We propose a well-designed hierarchical informative graph neural networks framework (termed HiGNN) for predicting molecular property.
Experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark datasets.
arXiv Detail & Related papers (2022-08-30T05:16:15Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Target-aware Molecular Graph Generation [37.937378787812264]
We propose SiamFlow, which forces the flow to fit the distribution of target sequence embeddings in latent space.
Specifically, we employ an alignment loss and a uniform loss to bring target sequence embeddings and drug graph embeddings into agreements.
Experiments quantitatively show that our proposed method learns meaningful representations in the latent space toward the target-aware molecular graph generation.
arXiv Detail & Related papers (2022-02-10T04:31:14Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.