Augmenting Molecular Deep Generative Models with Topological Data
Analysis Representations
- URL: http://arxiv.org/abs/2106.04464v1
- Date: Tue, 8 Jun 2021 15:49:21 GMT
- Title: Augmenting Molecular Deep Generative Models with Topological Data
Analysis Representations
- Authors: Yair Schiff, Vijil Chenthamarakshan, Samuel Hoffman, Karthikeyan
Natesan Ramamurthy, Payel Das
- Abstract summary: We present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules.
Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties.
- Score: 21.237758981760784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have emerged as a powerful tool for learning
informative molecular representations and designing novel molecules with
desired properties, with applications in drug discovery and material design.
Deep generative auto-encoders defined over molecular SMILES strings have been a
popular choice for that purpose. However, capturing salient molecular
properties like quantum-chemical energies remains challenging and requires
sophisticated neural net models of molecular graphs or geometry-based
information. As a simpler and more efficient alternative, we present a SMILES
Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA)
representations of molecules, known as persistence images. Our experiments show
that this TDA augmentation enables a SMILES VAE to capture the complex relation
between 3D geometry and electronic properties, and allows generation of novel,
diverse, and valid molecules with geometric features consistent with the
training data, which exhibit a varying range of global electronic structural
properties, such as a small HOMO-LUMO gap - a critical property for designing
organic solar cells. We demonstrate that our TDA augmentation yields better
success in downstream tasks compared to models trained without these
representations and can assist in targeted molecule discovery.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - SE3Set: Harnessing equivariant hypergraph neural networks for molecular representation learning [27.713870291922333]
We develop an SE(3) equivariant hypergraph neural network architecture tailored for advanced molecular representation learning.
SE3Set has shown performance on par with state-of-the-art (SOTA) models for small molecule datasets.
It excels on the MD22 dataset, achieving a notable improvement of approximately 20% in accuracy across all molecules.
arXiv Detail & Related papers (2024-05-26T10:43:16Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - Geometry-Complete Diffusion for 3D Molecule Generation and Optimization [3.8366697175402225]
We introduce the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation.
GCDM outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings.
We also show that GCDM's geometric features can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules.
arXiv Detail & Related papers (2023-02-08T20:01:51Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Dynamic Molecular Graph-based Implementation for Biophysical Properties
Prediction [9.112532782451233]
We propose a novel approach based on the transformer model utilizing GNNs for characterizing dynamic features of protein-ligand interactions.
Our message passing transformer pre-trains on a set of molecular dynamic data based off of physics-based simulations to learn coordinate construction and make binding probability and affinity predictions.
arXiv Detail & Related papers (2022-12-20T04:21:19Z) - Interpretable Molecular Graph Generation via Monotonic Constraints [19.401468196146336]
Deep graph generative models treat molecule design as graph generation problems.
Existing models have many shortcomings, including poor interpretability and controllability toward desired molecular properties.
This paper proposes new methodologies for molecule generation with interpretable and deep controllable models.
arXiv Detail & Related papers (2022-02-28T08:35:56Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.