FragFM: Efficient Fragment-Based Molecular Generation via Discrete Flow Matching
- URL: http://arxiv.org/abs/2502.15805v1
- Date: Wed, 19 Feb 2025 07:01:00 GMT
- Title: FragFM: Efficient Fragment-Based Molecular Generation via Discrete Flow Matching
- Authors: Joongwon Lee, Seonghwan Kim, Wou Youn Kim,
- Abstract summary: We introduce FragFM, a novel fragment-based discrete flow matching framework for molecular graph generation.<n>FragFM generates molecules at the fragment level, leveraging a coarse-to-fine autoencoding mechanism to reconstruct atom-level details.
- Score: 0.3345437353879254
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce FragFM, a novel fragment-based discrete flow matching framework for molecular graph generation.FragFM generates molecules at the fragment level, leveraging a coarse-to-fine autoencoding mechanism to reconstruct atom-level details. This approach reduces computational complexity while maintaining high chemical validity, enabling more efficient and scalable molecular generation. We benchmark FragFM against state-of-the-art diffusion- and flow-based models on standard molecular generation benchmarks and natural product datasets, demonstrating superior performance in validity, property control, and sampling efficiency. Notably, FragFM achieves over 99\% validity with significantly fewer sampling steps, improving scalability while preserving molecular diversity. These results highlight the potential of fragment-based generative modeling for large-scale, property-aware molecular design, paving the way for more efficient exploration of chemical space.
Related papers
- DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network.<n>We develop a robust decoder that bridges latent embeddings and molecular structures.<n>Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - GenMol: A Drug Discovery Generalist with Discrete Diffusion [43.29814519270451]
Generalist Molecular generative model (GenMol) is a versatile framework that addresses various aspects of the drug discovery pipeline.<n>Under the discrete diffusion framework, we introduce fragment remasking, a strategy that optimize molecules by replacing fragments with masked tokens.<n>GenMol significantly outperforms the previous GPT-based model trained on SAFE representations in de novo generation and fragment-constrained generation.
arXiv Detail & Related papers (2025-01-10T18:30:05Z) - MolMiner: Transformer architecture for fragment-based autoregressive generation of molecular stories [7.366789601705544]
Chemical validity, interpretability of the generation process and flexibility to variable molecular sizes are among some of the remaining challenges for generative models in computational materials design.
We propose an autoregressive approach that decomposes molecular generation into a sequence of discrete and interpretable steps.
Our results show that the model can effectively bias the generation distribution according to the prompted multi-target objective.
arXiv Detail & Related papers (2024-11-10T22:00:55Z) - Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design [17.175846006359674]
We introduce Conditional Latent Space Molecular Scaffold Optimization (CLaSMO) to modify molecules strategically while maintaining similarity to the original input.
Our LSBO setting improves the sample-efficiency of our optimization, and our modification approach helps us to obtain molecules with higher chances of real-world applicability.
We also provide an open-source web application that enables chemical experts to apply CLaSMO in a Human-in-the-Loop setting.
arXiv Detail & Related papers (2024-11-03T03:17:38Z) - LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.
LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space.
We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel
Approach to Generating Molecules with Desirable Properties [33.2976176283611]
We present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs.
To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method.
We show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fr'echet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models.
arXiv Detail & Related papers (2023-10-05T11:43:21Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.