Analysis of training and seed bias in small molecules generated with a
conditional graph-based variational autoencoder -- Insights for practical
AI-driven molecule generation
- URL: http://arxiv.org/abs/2107.08987v1
- Date: Mon, 19 Jul 2021 16:00:05 GMT
- Title: Analysis of training and seed bias in small molecules generated with a
conditional graph-based variational autoencoder -- Insights for practical
AI-driven molecule generation
- Authors: Seung-gu Kang, Joseph A. Morrone, Jeffrey K. Weber, Wendy D. Cornell
- Abstract summary: We analyze the impact of seed and training bias on the output of an activity-conditioned graph-based variational autoencoder (VAE)
Our graph-based generative model is shown to excel in producing desired conditioned activities and favorable unconditioned physical properties in generated molecules.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The application of deep learning to generative molecule design has shown
early promise for accelerating lead series development. However, questions
remain concerning how factors like training, dataset, and seed bias impact the
technology's utility to medicine and computational chemists. In this work, we
analyze the impact of seed and training bias on the output of an
activity-conditioned graph-based variational autoencoder (VAE). Leveraging a
massive, labeled dataset corresponding to the dopamine D2 receptor, our
graph-based generative model is shown to excel in producing desired conditioned
activities and favorable unconditioned physical properties in generated
molecules. We implement an activity swapping method that allows for the
activation, deactivation, or retention of activity of molecular seeds, and we
apply independent deep learning classifiers to verify the generative results.
Overall, we uncover relationships between noise, molecular seeds, and training
set selection across a range of latent-space sampling procedures, providing
important insights for practical AI-driven molecule generation.
Related papers
- UniGEM: A Unified Approach to Generation and Property Prediction for Molecules [33.94641403669206]
We propose UniGEM, the first unified model to successfully integrate molecular generation and property prediction.
Our key innovation lies in a novel two-phase generative process, where predictive tasks are activated in the later stages, after the molecular scaffold is formed.
The principles behind UniGEM hold promise for broader applications, including natural language processing and computer vision.
arXiv Detail & Related papers (2024-10-14T13:58:13Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Supervised Pretraining for Molecular Force Fields and Properties
Prediction [16.86839767858162]
We propose to pretrain neural networks on a dataset of 86 millions of molecules with atom charges and 3D geometries as inputs and molecular energies as labels.
Experiments show that, compared to training from scratch, fine-tuning the pretrained model can significantly improve the performance for seven molecular property prediction tasks and two force field tasks.
arXiv Detail & Related papers (2022-11-23T08:36:50Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - A biologically-inspired evaluation of molecular generative machine
learning [17.623886600638716]
A novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed.
We propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs.
arXiv Detail & Related papers (2022-08-20T11:01:10Z) - Pre-training Transformers for Molecular Property Prediction Using
Reaction Prediction [0.0]
Transfer learning has had a tremendous impact in fields like Computer Vision and Natural Language Processing.
We present a pre-training procedure for molecular representation learning using reaction data.
We show a statistically significant positive effect on 5 of the 12 tasks compared to a non-pre-trained baseline model.
arXiv Detail & Related papers (2022-07-06T14:51:38Z) - Target-aware Molecular Graph Generation [37.937378787812264]
We propose SiamFlow, which forces the flow to fit the distribution of target sequence embeddings in latent space.
Specifically, we employ an alignment loss and a uniform loss to bring target sequence embeddings and drug graph embeddings into agreements.
Experiments quantitatively show that our proposed method learns meaningful representations in the latent space toward the target-aware molecular graph generation.
arXiv Detail & Related papers (2022-02-10T04:31:14Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.