A Deep Generative Model for Fragment-Based Molecule Generation
- URL: http://arxiv.org/abs/2002.12826v1
- Date: Fri, 28 Feb 2020 15:55:11 GMT
- Title: A Deep Generative Model for Fragment-Based Molecule Generation
- Authors: Marco Podda, Davide Bacciu, Alessio Micheli
- Abstract summary: We develop a language model for small molecular substructures called fragments.
In other words, we generate molecules fragment by fragment, instead of atom by atom.
We show experimentally that our model largely outperforms other language model-based competitors.
- Score: 21.258861822241272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecule generation is a challenging open problem in cheminformatics.
Currently, deep generative approaches addressing the challenge belong to two
broad categories, differing in how molecules are represented. One approach
encodes molecular graphs as strings of text, and learns their corresponding
character-based language model. Another, more expressive, approach operates
directly on the molecular graph. In this work, we address two limitations of
the former: generation of invalid and duplicate molecules. To improve validity
rates, we develop a language model for small molecular substructures called
fragments, loosely inspired by the well-known paradigm of Fragment-Based Drug
Design. In other words, we generate molecules fragment by fragment, instead of
atom by atom. To improve uniqueness rates, we present a frequency-based masking
strategy that helps generate molecules with infrequent fragments. We show
experimentally that our model largely outperforms other language model-based
competitors, reaching state-of-the-art performances typical of graph-based
approaches. Moreover, generated molecules display molecular properties similar
to those in the training sample, even in absence of explicit task-specific
supervision.
Related papers
- LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.
LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space.
We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - MAGNet: Motif-Agnostic Generation of Molecules from Shapes [16.188301768974]
MAGNet is a graph-based model that generates abstract shapes before allocating atom and bond types.
We demonstrate that MAGNet's improved expressivity leads to molecules with more topologically distinct structures.
arXiv Detail & Related papers (2023-05-30T15:29:34Z) - De Novo Molecular Generation via Connection-aware Motif Mining [197.97528902698966]
We propose a new method, MiCaM, to generate molecules based on mined connection-aware motifs.
The obtained motif vocabulary consists of not only molecular motifs (i.e., the frequent fragments), but also their connection information.
Based on the mined connection-aware motifs, MiCaM builds a connection-aware generator, which simultaneously picks up motifs and determines how they are connected.
arXiv Detail & Related papers (2023-02-02T14:40:47Z) - Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Fragment-based Sequential Translation for Molecular Optimization [23.152338167332374]
We propose a flexible editing paradigm that generates molecules using learned molecular fragments.
We use a variational autoencoder to encode molecular fragments in a coherent latent space.
We then utilize as a vocabulary for editing molecules to explore the complex chemical property space.
arXiv Detail & Related papers (2021-10-26T21:20:54Z) - De Novo Molecular Generation with Stacked Adversarial Model [24.83456726428956]
Conditional generative adversarial models have recently been proposed as promising approaches for de novo drug design.
We propose a new generative model which extends an existing adversarial autoencoder based model by stacking two models together.
Our stacked approach generates more valid molecules, as well as molecules that are more similar to known drugs.
arXiv Detail & Related papers (2021-10-24T14:23:16Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z) - Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling.
Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.