Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics
- URL: http://arxiv.org/abs/2010.08548v2
- Date: Mon, 7 Jun 2021 16:29:56 GMT
- Title: Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics
- Authors: Yair Schiff, Vijil Chenthamarakshan, Karthikeyan Natesan Ramamurthy,
Payel Das
- Abstract summary: Variational Autos (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions.
We propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features.
- Score: 21.95240820041655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models are increasingly becoming integral parts of the in
silico molecule design pipeline and have dual goals of learning the chemical
and structural features that render candidate molecules viable while also being
flexible enough to generate novel designs. Specifically, Variational Auto
Encoders (VAEs) are generative models in which encoder-decoder network pairs
are trained to reconstruct training data distributions in such a way that the
latent space of the encoder network is smooth. Therefore, novel candidates can
be found by sampling from this latent space. However, the scope of
architectures and hyperparameters is vast and choosing the best combination for
in silico discovery has important implications for downstream success.
Therefore, it is important to develop a principled methodology for
distinguishing how well a given generative model is able to learn salient
molecular features. In this work, we propose a method for measuring how well
the latent space of deep generative models is able to encode structural and
chemical features of molecular datasets by correlating latent space metrics
with metrics from the field of topological data analysis (TDA). We apply our
evaluation methodology to a VAE trained on SMILES strings and show that 3D
topology information is consistently encoded throughout the latent space of the
model.
Related papers
- Pullback Flow Matching on Data Manifolds [10.187244125099479]
Pullback Flow Matching (PFM) is a framework for generative modeling on data manifold.
We demonstrate PFM's effectiveness through applications in synthetic, data dynamics and protein sequence data, generating novel proteins with specific properties.
This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.
arXiv Detail & Related papers (2024-10-06T16:41:26Z) - geom2vec: pretrained GNNs as geometric featurizers for conformational dynamics [0.0]
We introduce geom2vec, in which pretrained graph neural networks (GNNs) are used as universal featurizers.
We learn transferable structural representations that capture molecular geometric patterns without further fine-tuning.
arXiv Detail & Related papers (2024-09-30T00:36:06Z) - Generative Modeling of Molecular Dynamics Trajectories [12.255021091552441]
We introduce generative modeling of molecular trajectories as a paradigm for learning flexible multi-task surrogate models of MD from data.
We show such generative models can be adapted to diverse tasks such as forward simulation, transition path sampling, and trajectory upsampling.
arXiv Detail & Related papers (2024-09-26T13:02:28Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Geometric Latent Diffusion Models for 3D Molecule Generation [172.15028281732737]
Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries.
We propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM)
arXiv Detail & Related papers (2023-05-02T01:07:22Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels.
We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z) - Augmenting Molecular Deep Generative Models with Topological Data
Analysis Representations [21.237758981760784]
We present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules.
Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties.
arXiv Detail & Related papers (2021-06-08T15:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.