Attention Based Molecule Generation via Hierarchical Variational Autoencoder
- URL: http://arxiv.org/abs/2402.16854v1
- Date: Thu, 18 Jan 2024 21:45:12 GMT
- Title: Attention Based Molecule Generation via Hierarchical Variational Autoencoder
- Authors: Divahar Sivanesan,
- Abstract summary: We show that by combining recurrent neural networks with convolutional networks in a hierarchical manner, we are able to both extract autoregressive information from SMILES strings.
This allows for generations with very high validity rates on the order of 95% when reconstructing known molecules.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecule generation is a task made very difficult by the complex ways in which we represent molecules computationally. A common technique used in molecular generative modeling is to use SMILES strings with recurrent neural networks built into variational autoencoders - but these suffer from a myriad of issues: vanishing gradients, long-range forgetting, and invalid molecules. In this work, we show that by combining recurrent neural networks with convolutional networks in a hierarchical manner, we are able to both extract autoregressive information from SMILES strings while maintaining signal and long-range dependencies. This allows for generations with very high validity rates on the order of 95% when reconstructing known molecules. We also observe an average Tanimoto similarity of .6 between test set and reconstructed molecules, which suggests our method is able to map between SMILES strings and their learned representations in a more effective way than prior works using similar methods.
Related papers
- Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - RGCVAE: Relational Graph Conditioned Variational Autoencoder for
Molecule Design [70.59828655929194]
Deep Graph Variational Autoencoders are among the most powerful machine learning tools with which it is possible to address this problem.
We propose RGCVAE, an efficient and effective Graph Variational Autoencoder based on: (i) an encoding network exploiting a new powerful Graph Isomorphism Network; (ii) a novel probabilistic decoding component.
arXiv Detail & Related papers (2023-05-19T14:23:48Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Improving Small Molecule Generation using Mutual Information Machine [0.0]
MolMIM is a probabilistic auto-encoder for small molecule drug discovery.
We demonstrate MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty.
We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization.
arXiv Detail & Related papers (2022-08-18T18:32:48Z) - Generative Enriched Sequential Learning (ESL) Approach for Molecular
Design via Augmented Domain Knowledge [1.4410716345002657]
generative machine learning techniques can generate novel chemical structures based on molecular fingerprint representation.
Lack of supervised domain knowledge can mislead the learning procedure to be relatively biased to the prevalent molecules observed in the training data.
We alleviated this drawback by augmenting the training data with domain knowledge, e.g. quantitative estimates of the drug-likeness score (QEDs)
arXiv Detail & Related papers (2022-04-05T20:16:11Z) - Super-resolution in Molecular Dynamics Trajectory Reconstruction with
Bi-Directional Neural Networks [0.0]
We explore different machine learning (ML) methodologies to increase the resolution of molecular dynamics trajectories on-demand within a post-processing step.
We have found that Bi-LSTMs are the best performing models; by utilizing the local time-symmetry of thermostated trajectories they can even learn long-range correlations and display high robustness to noisy dynamics across molecular complexity.
arXiv Detail & Related papers (2022-01-02T23:00:30Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z) - Solving the electronic Schr\"odinger equation for multiple nuclear
geometries with weight-sharing deep neural networks [4.1201966507589995]
We introduce a weight-sharing constraint when optimizing neural network-based models for different molecular geometries.
We find that this technique can accelerate optimization when considering sets of nuclear geometries of the same molecule by an order of magnitude.
arXiv Detail & Related papers (2021-05-18T08:23:09Z) - ATOM3D: Tasks On Molecules in Three Dimensions [91.72138447636769]
Deep neural networks have recently gained significant attention.
In this work we present ATOM3D, a collection of both novel and existing datasets spanning several key classes of biomolecules.
We develop three-dimensional molecular learning networks for each of these tasks, finding that they consistently improve performance.
arXiv Detail & Related papers (2020-12-07T20:18:23Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.