De Novo Molecular Generation via Connection-aware Motif Mining
- URL: http://arxiv.org/abs/2302.01129v1
- Date: Thu, 2 Feb 2023 14:40:47 GMT
- Title: De Novo Molecular Generation via Connection-aware Motif Mining
- Authors: Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang,
Yongdong Zhang, Feng Wu and Tie-Yan Liu
- Abstract summary: We propose a new method, MiCaM, to generate molecules based on mined connection-aware motifs.
The obtained motif vocabulary consists of not only molecular motifs (i.e., the frequent fragments), but also their connection information.
Based on the mined connection-aware motifs, MiCaM builds a connection-aware generator, which simultaneously picks up motifs and determines how they are connected.
- Score: 197.97528902698966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: De novo molecular generation is an essential task for science discovery.
Recently, fragment-based deep generative models have attracted much research
attention due to their flexibility in generating novel molecules based on
existing molecule fragments. However, the motif vocabulary, i.e., the
collection of frequent fragments, is usually built upon heuristic rules, which
brings difficulties to capturing common substructures from large amounts of
molecules. In this work, we propose a new method, MiCaM, to generate molecules
based on mined connection-aware motifs. Specifically, it leverages a
data-driven algorithm to automatically discover motifs from a molecule library
by iteratively merging subgraphs based on their frequency. The obtained motif
vocabulary consists of not only molecular motifs (i.e., the frequent
fragments), but also their connection information, indicating how the motifs
are connected with each other. Based on the mined connection-aware motifs,
MiCaM builds a connection-aware generator, which simultaneously picks up motifs
and determines how they are connected. We test our method on
distribution-learning benchmarks (i.e., generating novel molecules to resemble
the distribution of a given training set) and goal-directed benchmarks (i.e.,
generating molecules with target properties), and achieve significant
improvements over previous fragment-based baselines. Furthermore, we
demonstrate that our method can effectively mine domain-specific motifs for
different tasks.
Related papers
- Zero Shot Molecular Generation via Similarity Kernels [0.6597195879147557]
We present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation.
SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules.
We also release an interactive web tool that allows users to generate structures with SiMGen online.
arXiv Detail & Related papers (2024-02-13T17:53:44Z) - Atom-Motif Contrastive Transformer for Molecular Property Prediction [68.85399466928976]
Graph Transformer (GT) models have been widely used in the task of Molecular Property Prediction (MPP)
We propose a novel Atom-Motif Contrastive Transformer (AMCT) which explores atom-level interactions and considers motif-level interactions.
Our proposed AMCT is extensively evaluated on seven popular benchmark datasets, and both quantitative and qualitative results firmly demonstrate its effectiveness.
arXiv Detail & Related papers (2023-10-11T10:03:10Z) - MAGNet: Motif-Agnostic Generation of Molecules from Shapes [16.188301768974]
MAGNet is a graph-based model that generates abstract shapes before allocating atom and bond types.
We demonstrate that MAGNet's improved expressivity leads to molecules with more topologically distinct structures.
arXiv Detail & Related papers (2023-05-30T15:29:34Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design [82.23006955069229]
We propose DiffLinker, an E(3)-equivariant 3D-conditional diffusion model for molecular linker design.
Our model places missing atoms in between and designs a molecule incorporating all the initial fragments.
We demonstrate that DiffLinker outperforms other methods on the standard datasets generating more diverse and synthetically-accessible molecules.
arXiv Detail & Related papers (2022-10-11T09:13:37Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Fragment-based Sequential Translation for Molecular Optimization [23.152338167332374]
We propose a flexible editing paradigm that generates molecules using learned molecular fragments.
We use a variational autoencoder to encode molecular fragments in a coherent latent space.
We then utilize as a vocabulary for editing molecules to explore the complex chemical property space.
arXiv Detail & Related papers (2021-10-26T21:20:54Z) - Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling.
Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z) - A Deep Generative Model for Fragment-Based Molecule Generation [21.258861822241272]
We develop a language model for small molecular substructures called fragments.
In other words, we generate molecules fragment by fragment, instead of atom by atom.
We show experimentally that our model largely outperforms other language model-based competitors.
arXiv Detail & Related papers (2020-02-28T15:55:11Z) - Multi-Objective Molecule Generation using Interpretable Substructures [38.637412590671865]
Drug discovery aims to find novel compounds with specified chemical property profiles.
The goal is to learn to sample molecules in the intersection of multiple property constraints.
We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales.
arXiv Detail & Related papers (2020-02-08T22:55:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.