SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
- URL: http://arxiv.org/abs/2507.11818v1
- Date: Wed, 16 Jul 2025 00:36:35 GMT
- Title: SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
- Authors: Andrei Rekesh, Miruna Cretu, Dmytro Shevchuk, Vignesh Ram Somnath, Pietro Liò, Robert A. Batey, Mike Tyers, Michał Koziarski, Cheng-Hao Liu,
- Abstract summary: We present SynCoGen, a framework that combines masked graph diffusion and flow matching for synthesizable 3D molecule generation.<n>To train the model, we curated SynSpace, a dataset containing over 600K-aware building block graphs and 3.3M conformers.
- Score: 29.856853267388924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensuring synthesizability in generative small molecule design remains a major challenge. While recent developments in synthesizable molecule generation have demonstrated promising results, these efforts have been largely confined to 2D molecular graph representations, limiting the ability to perform geometry-based conditional generation. In this work, we present SynCoGen (Synthesizable Co-Generation), a single framework that combines simultaneous masked graph diffusion and flow matching for synthesizable 3D molecule generation. SynCoGen samples from the joint distribution of molecular building blocks, chemical reactions, and atomic coordinates. To train the model, we curated SynSpace, a dataset containing over 600K synthesis-aware building block graphs and 3.3M conformers. SynCoGen achieves state-of-the-art performance in unconditional small molecule graph and conformer generation, and the model delivers competitive performance in zero-shot molecular linker design for protein ligand generation in drug discovery. Overall, this multimodal formulation represents a foundation for future applications enabled by non-autoregressive molecular generation, including analog expansion, lead optimization, and direct structure conditioning.
Related papers
- Synthesizable by Design: A Retrosynthesis-Guided Framework for Molecular Analog Generation [0.5852077003870417]
We introduce SynTwins, a novel retrosynthesis-guided molecular analog design framework.<n>In comparative evaluations, SynTwins demonstrates superior performance in generating synthetically accessible analogs.<n>Our benchmarking across diverse molecular datasets demonstrates that SynTwins effectively bridges the gap between computational design and experimental synthesis.
arXiv Detail & Related papers (2025-07-03T16:14:57Z) - A collaborative constrained graph diffusion model for the generation of realistic synthetic molecules [0.0]
We introduce CoCoGraph, a collaborative and constrained graph diffusion model capable of generating chemically valid molecules.<n>Thanks to the constraints built into the model and to the collaborative mechanism, CoCoGraph outperforms state-of-the-art approaches on standard benchmarks.<n>We created a database of 8.2M million synthetically generated molecules and conducted a Turing-like test with organic chemistry experts to further assess the plausibility of the generated molecules.
arXiv Detail & Related papers (2025-05-22T08:21:27Z) - SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models [2.4120602995529317]
We present a novel approach by fine-tuning Meta's Llama3 Large Language Models to create SynLlama.<n> SynLlama generates full synthetic pathways made of commonly accessible building blocks and robust organic reaction templates.<n>We find that SynLlama, even without training on external building blocks, can effectively generalize to unseen yet purchasable building blocks.
arXiv Detail & Related papers (2025-03-16T18:30:56Z) - SynthFormer: Equivariant Pharmacophore-based Generation of Synthesizable Molecules for Ligand-Based Drug Design [19.578382119811238]
We introduce SynthFormer, a novel machine learning model that generates fully synthesizable molecules, structured as synthetic trees, by introducing both 3D information and pharmacophores as input.<n>It is a first-of-its-kind approach that could provide capabilities for designing active molecules based on pharmacophores.
arXiv Detail & Related papers (2024-10-03T17:38:46Z) - RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions.
RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates.
We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z) - SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints [16.21161274235011]
We introduce SynFlowNet, a GFlowNet model whose action space uses chemical reactions and purchasable reactants to sequentially build new molecules.<n>By incorporating forward synthesis as an explicit constraint of the generative mechanism, we aim at bridging the gap between in silico molecular generation and real world synthesis capabilities.
arXiv Detail & Related papers (2024-05-02T10:15:59Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation [47.15291538945242]
This work introduces MiDi, a novel diffusion model for jointly generating molecular graphs and their corresponding 3D arrangement of atoms.
Unlike existing methods that rely on predefined rules to determine molecular bonds based on the 3D conformation, MiDi offers an end-to-end differentiable approach that streamlines the molecule generation process.
arXiv Detail & Related papers (2023-02-17T18:27:14Z) - DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding [51.970607704953096]
Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one.
In real-world molecular systems, the interactions among atoms in an entire molecule are global, leading to the energy function pair-coupled among atoms.
In this work, a generative diffusion model for molecular 3D structures based on target proteins is established, at a full-atom level in a non-autoregressive way.
arXiv Detail & Related papers (2022-11-21T07:02:15Z) - In-Pocket 3D Graphs Enhance Ligand-Target Compatibility in Generative
Small-Molecule Creation [0.0]
We present a graph-based generative modeling technology that encodes explicit 3D protein-ligand contacts within a relational graph architecture.
The models combine a conditional variational autoencoder that allows for activity-specific molecule generation with putative contact generation that provides predictions of molecular interactions within the target binding pocket.
arXiv Detail & Related papers (2022-04-05T22:53:51Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Learning Graph Models for Retrosynthesis Prediction [90.15523831087269]
Retrosynthesis prediction is a fundamental problem in organic synthesis.
This paper introduces a graph-based approach that capitalizes on the idea that the graph topology of precursor molecules is largely unaltered during a chemical reaction.
Our model achieves a top-1 accuracy of $53.7%$, outperforming previous template-free and semi-template-based methods.
arXiv Detail & Related papers (2020-06-12T09:40:42Z) - Learning To Navigate The Synthetically Accessible Chemical Space Using
Reinforcement Learning [75.95376096628135]
We propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design.
In this setup, the agent learns to navigate through the immense synthetically accessible chemical space.
We describe how the end-to-end training in this study represents an important paradigm in radically expanding the synthesizable chemical space.
arXiv Detail & Related papers (2020-04-26T21:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.