Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics
- URL: http://arxiv.org/abs/2010.08548v2
- Date: Mon, 7 Jun 2021 16:29:56 GMT
- Title: Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics
- Authors: Yair Schiff, Vijil Chenthamarakshan, Karthikeyan Natesan Ramamurthy,
Payel Das
- Abstract summary: Variational Autos (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions.
We propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features.
- Score: 21.95240820041655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models are increasingly becoming integral parts of the in
silico molecule design pipeline and have dual goals of learning the chemical
and structural features that render candidate molecules viable while also being
flexible enough to generate novel designs. Specifically, Variational Auto
Encoders (VAEs) are generative models in which encoder-decoder network pairs
are trained to reconstruct training data distributions in such a way that the
latent space of the encoder network is smooth. Therefore, novel candidates can
be found by sampling from this latent space. However, the scope of
architectures and hyperparameters is vast and choosing the best combination for
in silico discovery has important implications for downstream success.
Therefore, it is important to develop a principled methodology for
distinguishing how well a given generative model is able to learn salient
molecular features. In this work, we propose a method for measuring how well
the latent space of deep generative models is able to encode structural and
chemical features of molecular datasets by correlating latent space metrics
with metrics from the field of topological data analysis (TDA). We apply our
evaluation methodology to a VAE trained on SMILES strings and show that 3D
topology information is consistently encoded throughout the latent space of the
model.
Related papers
- Geometric Latent Diffusion Models for 3D Molecule Generation [172.15028281732737]
Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries.
We propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM)
arXiv Detail & Related papers (2023-05-02T01:07:22Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Deep Kernel Methods Learn Better: From Cards to Process Optimization [0.7587345054583298]
We show that DKL with active learning can produce a more compact and smooth latent space.
We demonstrate this behavior using a simple cards data set and extend it to the optimization of domain-generated trajectories in physical systems.
arXiv Detail & Related papers (2023-03-25T20:21:29Z) - CHA2: CHemistry Aware Convex Hull Autoencoder Towards Inverse Molecular
Design [2.169755083801688]
It is impossible to explore the entire search space comprehensively to exploit de novo structures with properties of interest.
To address this challenge, reducing the intractable search space into a lower-dimensional latent volume helps examine molecular candidates more feasibly.
We propose using a convex hall surrounding the top molecules in terms of high QEDs to ensnare a tight subspace in the latent representation as an efficient way to reveal novel molecules with high QEDs.
arXiv Detail & Related papers (2023-02-21T21:05:31Z) - Variational Autoencoding Neural Operators [17.812064311297117]
Unsupervised learning with functional data is an emerging paradigm of machine learning research with applications to computer vision, climate modeling and physical systems.
We present Variational Autoencoding Neural Operators (VANO), a general strategy for making a large class of operator learning architectures act as variational autoencoders.
arXiv Detail & Related papers (2023-02-20T22:34:43Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Semi-Supervised Manifold Learning with Complexity Decoupled Chart
Autoencoders [65.2511270059236]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels.
We discuss the theoretical approximation power of such networks that essentially depends on the intrinsic dimension of the data manifold.
arXiv Detail & Related papers (2022-08-22T19:58:03Z) - Augmenting Molecular Deep Generative Models with Topological Data
Analysis Representations [21.237758981760784]
We present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules.
Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties.
arXiv Detail & Related papers (2021-06-08T15:49:21Z) - Geometry-Aware Hamiltonian Variational Auto-Encoder [0.0]
Variational auto-encoders (VAEs) have proven to be a well suited tool for performing dimensionality reduction by extracting latent variables lying in a potentially much smaller dimensional space than the data.
However, such generative models may perform poorly when trained on small data sets which are abundant in many real-life fields such as medicine.
We argue that such latent space modelling provides useful information about its underlying structure leading to far more meaningfuls, more realistic data-generation and more reliable clustering.
arXiv Detail & Related papers (2020-10-22T08:26:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.