Deep Molecular Dreaming: Inverse machine learning for de-novo molecular
design and interpretability with surjective representations
- URL: http://arxiv.org/abs/2012.09712v1
- Date: Thu, 17 Dec 2020 16:34:59 GMT
- Title: Deep Molecular Dreaming: Inverse machine learning for de-novo molecular
design and interpretability with surjective representations
- Authors: Cynthia Shen, Mario Krenn, Sagi Eppel, Alan Aspuru-Guzik
- Abstract summary: We propose PASITHEA, a gradient-based molecule optimization technique from computer vision.
It exploits the use of gradients by directly reversing the learning process of a neural network, which is trained to predict real-valued chemical properties.
Although our results are preliminary, we observe a shift in distribution of a chosen property during inverse-training, a clear indication of PASITHEA's viability.
- Score: 1.433758865948252
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer-based de-novo design of functional molecules is one of the most
prominent challenges in cheminformatics today. As a result, generative and
evolutionary inverse designs from the field of artificial intelligence have
emerged at a rapid pace, with aims to optimize molecules for a particular
chemical property. These models 'indirectly' explore the chemical space; by
learning latent spaces, policies, distributions or by applying mutations on
populations of molecules. However, the recent development of the SELFIES string
representation of molecules, a surjective alternative to SMILES, have made
possible other potential techniques. Based on SELFIES, we therefore propose
PASITHEA, a direct gradient-based molecule optimization that applies
inceptionism techniques from computer vision. PASITHEA exploits the use of
gradients by directly reversing the learning process of a neural network, which
is trained to predict real-valued chemical properties. Effectively, this forms
an inverse regression model, which is capable of generating molecular variants
optimized for a certain property. Although our results are preliminary, we
observe a shift in distribution of a chosen property during inverse-training, a
clear indication of PASITHEA's viability. A striking property of inceptionism
is that we can directly probe the model's understanding of the chemical space
it was trained on. We expect that extending PASITHEA to larger datasets,
molecules and more complex properties will lead to advances in the design of
new functional molecules as well as the interpretation and explanation of
machine learning models.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - MING: A Functional Approach to Learning Molecular Generative Models [46.189683355768736]
This paper introduces a novel paradigm for learning molecule generative models based on functional representations.
We propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in function space.
arXiv Detail & Related papers (2024-10-16T13:02:02Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z) - Augmenting Molecular Deep Generative Models with Topological Data
Analysis Representations [21.237758981760784]
We present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules.
Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties.
arXiv Detail & Related papers (2021-06-08T15:49:21Z) - Reinforcement Learning for Molecular Design Guided by Quantum Mechanics [10.112779201155005]
We present a novel RL formulation for molecular design in coordinates, thereby extending the class of molecules that can be built.
Our reward function is directly based on fundamental physical properties such as the energy, which we approximate via fast quantum-chemical methods.
In our experiments, we show that our agent can efficiently learn to solve these tasks from scratch by working in a translation and rotation invariant state-action space.
arXiv Detail & Related papers (2020-02-18T16:43:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.