Related papers: Characterizing the Latent Space of Molecular Deep Generative Models with Persistent Homology Metrics

Characterizing the Latent Space of Molecular Deep Generative Models with Persistent Homology Metrics

URL: http://arxiv.org/abs/2010.08548v2
Date: Mon, 7 Jun 2021 16:29:56 GMT
Title: Characterizing the Latent Space of Molecular Deep Generative Models with Persistent Homology Metrics
Authors: Yair Schiff, Vijil Chenthamarakshan, Karthikeyan Natesan Ramamurthy, Payel Das
Abstract summary: Variational Autos (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions. We propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features.
Score: 21.95240820041655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep generative models are increasingly becoming integral parts of the in silico molecule design pipeline and have dual goals of learning the chemical and structural features that render candidate molecules viable while also being flexible enough to generate novel designs. Specifically, Variational Auto Encoders (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions in such a way that the latent space of the encoder network is smooth. Therefore, novel candidates can be found by sampling from this latent space. However, the scope of architectures and hyperparameters is vast and choosing the best combination for in silico discovery has important implications for downstream success. Therefore, it is important to develop a principled methodology for distinguishing how well a given generative model is able to learn salient molecular features. In this work, we propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features of molecular datasets by correlating latent space metrics with metrics from the field of topological data analysis (TDA). We apply our evaluation methodology to a VAE trained on SMILES strings and show that 3D topology information is consistently encoded throughout the latent space of the model.

Related papers

Next Generation Equation-Free Multiscale Modelling of Crowd Dynamics via Machine Learning [0.0]
We propose a combined manifold and machine learning approach to learn the discrete evolution operator for the emergent crowd dynamics in latent spaces.<n>Our approach is a four-stage one, explicitly conserving the mass of the reconstructed dynamics in the high-dimensional space.
arXiv Detail & Related papers (2025-08-05T21:39:18Z)
Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z)
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network. We develop a robust decoder that bridges latent embeddings and molecular structures. Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z)
Exploring Discrete Flow Matching for 3D De Novo Molecule Generation [0.0]
Flow matching is a recently proposed generative modeling framework that has achieved impressive performance on a variety of tasks. We present FlowMol-CTMC, an open-source model that achieves state of the art performance for 3D de novo design with fewer learnable parameters than existing methods.
arXiv Detail & Related papers (2024-11-25T18:27:39Z)
Pullback Flow Matching on Data Manifolds [10.187244125099479]
Pullback Flow Matching (PFM) is a framework for generative modeling on data manifold. We demonstrate PFM's effectiveness through applications in synthetic, data dynamics and protein sequence data, generating novel proteins with specific properties. This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.
arXiv Detail & Related papers (2024-10-06T16:41:26Z)
geom2vec: pretrained GNNs as geometric featurizers for conformational dynamics [0.0]
We introduce geom2vec, in which pretrained graph neural networks (GNNs) are used as universal featurizers. We learn transferable structural representations that capture molecular geometric patterns without further fine-tuning.
arXiv Detail & Related papers (2024-09-30T00:36:06Z)
Generative Modeling of Molecular Dynamics Trajectories [12.255021091552441]
We introduce generative modeling of molecular trajectories as a paradigm for learning flexible multi-task surrogate models of MD from data. We show such generative models can be adapted to diverse tasks such as forward simulation, transition path sampling, and trajectory upsampling.
arXiv Detail & Related papers (2024-09-26T13:02:28Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Geometric Latent Diffusion Models for 3D Molecule Generation [172.15028281732737]
Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries. We propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM)
arXiv Detail & Related papers (2023-05-02T01:07:22Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z)
Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation. We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z)
Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels. We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z)
Augmenting Molecular Deep Generative Models with Topological Data Analysis Representations [21.237758981760784]
We present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules. Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties.
arXiv Detail & Related papers (2021-06-08T15:49:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.