Improving Molecular Design by Stochastic Iterative Target Augmentation
- URL: http://arxiv.org/abs/2002.04720v3
- Date: Sun, 15 Aug 2021 18:40:15 GMT
- Title: Improving Molecular Design by Stochastic Iterative Target Augmentation
- Authors: Kevin Yang, Wengong Jin, Kyle Swanson, Regina Barzilay, Tommi Jaakkola
- Abstract summary: Generative models in molecular design tend to be richly parameterized, data-hungry neural models.
We propose a surprisingly effective self-training approach for iteratively creating additional molecular targets.
Our approach outperforms the previous state-of-the-art in conditional molecular design by over 10% in absolute gain.
- Score: 38.44457632751997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models in molecular design tend to be richly parameterized,
data-hungry neural models, as they must create complex structured objects as
outputs. Estimating such models from data may be challenging due to the lack of
sufficient training data. In this paper, we propose a surprisingly effective
self-training approach for iteratively creating additional molecular targets.
We first pre-train the generative model together with a simple property
predictor. The property predictor is then used as a likelihood model for
filtering candidate structures from the generative model. Additional targets
are iteratively produced and used in the course of stochastic EM iterations to
maximize the log-likelihood that the candidate structures are accepted. A
simple rejection (re-weighting) sampler suffices to draw posterior samples
since the generative model is already reasonable after pre-training. We
demonstrate significant gains over strong baselines for both unconditional and
conditional molecular design. In particular, our approach outperforms the
previous state-of-the-art in conditional molecular design by over 10% in
absolute gain. Finally, we show that our approach is useful in other domains as
well, such as program synthesis.
Related papers
- Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences [20.629333587044012]
We study the impact of data curation on iterated retraining of generative models.
We prove that, if the data is curated according to a reward model, the expected reward of the iterative retraining procedure is maximized.
arXiv Detail & Related papers (2024-06-12T21:28:28Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Variational Autoencoding Molecular Graphs with Denoising Diffusion
Probabilistic Model [0.0]
We propose a novel deep generative model that incorporates a hierarchical structure into the probabilistic latent vectors.
We demonstrate that our model can design effective molecular latent vectors for molecular property prediction from some experiments by small datasets on physical properties and activity.
arXiv Detail & Related papers (2023-07-02T17:29:41Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Score-Based Generative Models for Molecule Generation [0.8808021343665321]
We train a Transformer-based score function on representations of 1.5 million samples from the ZINC dataset.
We use the Moses benchmarking framework to evaluate the generated samples on a suite of metrics.
arXiv Detail & Related papers (2022-03-07T13:46:02Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.