A collaborative constrained graph diffusion model for the generation of realistic synthetic molecules
- URL: http://arxiv.org/abs/2505.16365v1
- Date: Thu, 22 May 2025 08:21:27 GMT
- Title: A collaborative constrained graph diffusion model for the generation of realistic synthetic molecules
- Authors: Manuel Ruiz-Botella, Marta Sales-Pardo, Roger GuimerĂ ,
- Abstract summary: We introduce CoCoGraph, a collaborative and constrained graph diffusion model capable of generating chemically valid molecules.<n>Thanks to the constraints built into the model and to the collaborative mechanism, CoCoGraph outperforms state-of-the-art approaches on standard benchmarks.<n>We created a database of 8.2M million synthetically generated molecules and conducted a Turing-like test with organic chemistry experts to further assess the plausibility of the generated molecules.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Developing new molecular compounds is crucial to address pressing challenges, from health to environmental sustainability. However, exploring the molecular space to discover new molecules is difficult due to the vastness of the space. Here we introduce CoCoGraph, a collaborative and constrained graph diffusion model capable of generating molecules that are guaranteed to be chemically valid. Thanks to the constraints built into the model and to the collaborative mechanism, CoCoGraph outperforms state-of-the-art approaches on standard benchmarks while requiring up to an order of magnitude fewer parameters. Analysis of 36 chemical properties also demonstrates that CoCoGraph generates molecules with distributions more closely matching real molecules than current models. Leveraging the model's efficiency, we created a database of 8.2M million synthetically generated molecules and conducted a Turing-like test with organic chemistry experts to further assess the plausibility of the generated molecules, and potential biases and limitations of CoCoGraph.
Related papers
- SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling [29.856853267388924]
We present SynCoGen, a framework that combines masked graph diffusion and flow matching for synthesizable 3D molecule generation.<n>To train the model, we curated SynSpace, a dataset containing over 600K-aware building block graphs and 3.3M conformers.
arXiv Detail & Related papers (2025-07-16T00:36:35Z) - Torsional-GFN: a conditional conformation generator for small molecules [75.91029322687771]
We introduce a conditional GFlowNet specifically designed to sample conformations of molecules proportionally to their Boltzmann distribution.<n>Our work presents a promising avenue for scaling the proposed approach to larger molecular systems.
arXiv Detail & Related papers (2025-07-15T21:53:25Z) - FragFM: Efficient Fragment-Based Molecular Generation via Discrete Flow Matching [0.3345437353879254]
We introduce FragFM, a novel fragment-based discrete flow matching framework for molecular graph generation.<n>FragFM generates molecules at the fragment level, leveraging a coarse-to-fine autoencoding mechanism to reconstruct atom-level details.
arXiv Detail & Related papers (2025-02-19T07:01:00Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)<n>KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.<n>This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - MolMiner: Transformer architecture for fragment-based autoregressive generation of molecular stories [7.366789601705544]
Chemical validity, interpretability of the generation process and flexibility to variable molecular sizes are among some of the remaining challenges for generative models in computational materials design.
We propose an autoregressive approach that decomposes molecular generation into a sequence of discrete and interpretable steps.
Our results show that the model can effectively bias the generation distribution according to the prompted multi-target objective.
arXiv Detail & Related papers (2024-11-10T22:00:55Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.<n> Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.<n>By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Interpretable Molecular Graph Generation via Monotonic Constraints [19.401468196146336]
Deep graph generative models treat molecule design as graph generation problems.
Existing models have many shortcomings, including poor interpretability and controllability toward desired molecular properties.
This paper proposes new methodologies for molecule generation with interpretable and deep controllable models.
arXiv Detail & Related papers (2022-02-28T08:35:56Z) - Conditional Constrained Graph Variational Autoencoders for Molecule
Design [70.59828655929194]
We present Conditional Constrained Graph Variational Autoencoder (CCGVAE), a model that implements this key-idea in a state-of-the-art model.
We show improved results on several evaluation metrics on two commonly adopted datasets for molecule generation.
arXiv Detail & Related papers (2020-09-01T21:58:07Z) - The Synthesizability of Molecules Proposed by Generative Models [3.032184156362992]
Discovery of functional molecules is an expensive and time-consuming process.
One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization.
These techniques can suggest novel molecular structures intended to maximize a multi-objective function.
However, the utility of these approaches is stymied by ignorance of synthesizability.
arXiv Detail & Related papers (2020-02-17T15:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.