Combined Representation and Generation with Diffusive State Predictive Information Bottleneck
- URL: http://arxiv.org/abs/2510.09784v1
- Date: Fri, 10 Oct 2025 18:46:21 GMT
- Title: Combined Representation and Generation with Diffusive State Predictive Information Bottleneck
- Authors: Richard John, Yunrui Qiu, Lukas Herron, Pratyush Tiwary,
- Abstract summary: We combine a time-lagged information bottleneck designed to characterize molecular important representations and a diffusion model in one joint training objective.<n>The resulting protocol, which we term Diffusive State Predictive Information Bottleneck (D-SPIB), enables the balancing of representation learning and generation aims in one flexible architecture.<n>We benchmark D-SPIB on multiple molecular tasks and showcase its potential for exploring physical conditions outside the training set.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generative modeling becomes increasingly data-intensive in high-dimensional spaces. In molecular science, where data collection is expensive and important events are rare, compression to lower-dimensional manifolds is especially important for various downstream tasks, including generation. We combine a time-lagged information bottleneck designed to characterize molecular important representations and a diffusion model in one joint training objective. The resulting protocol, which we term Diffusive State Predictive Information Bottleneck (D-SPIB), enables the balancing of representation learning and generation aims in one flexible architecture. Additionally, the model is capable of combining temperature information from different molecular simulation trajectories to learn a coherent and useful internal representation of thermodynamics. We benchmark D-SPIB on multiple molecular tasks and showcase its potential for exploring physical conditions outside the training set.
Related papers
- Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data [0.0]
We introduce Latent Thermodynamic Flows (LaTF), an end-to-end framework that tightly integrates representation learning and generative modeling.<n>LaTF unifies the State Predictive Information Bottleneck (SPIB) with NFs to simultaneously learn low-dimensional latent representations.<n>We demonstrate LaTF's effectiveness across diverse systems, including a model potential, the Chignolin protein, and cluster of Lennard Jones particles.
arXiv Detail & Related papers (2025-07-03T21:02:36Z) - Graph Fourier Neural ODEs: Modeling Spatial-temporal Multi-scales in Molecular Dynamics [38.53044197103943]
GF-NODE integrates a graph Fourier transform for spatial frequency decomposition with a Neural ODE framework for continuous-time evolution.<n>We show that GF-NODE achieves state-of-the-art accuracy while preserving essential geometrical features over extended simulations.<n>These findings highlight the promise of bridging spectral decomposition with continuous-time modeling to improve the robustness and predictive power of MD simulations.
arXiv Detail & Related papers (2024-11-03T15:10:48Z) - MING: A Functional Approach to Learning Molecular Generative Models [46.189683355768736]
This paper introduces a novel paradigm for learning molecule generative models based on functional representations.<n>We propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in the function space.
arXiv Detail & Related papers (2024-10-16T13:02:02Z) - Generative Modeling of Molecular Dynamics Trajectories [12.255021091552441]
We introduce generative modeling of molecular trajectories as a paradigm for learning flexible multi-task surrogate models of MD from data.
We show such generative models can be adapted to diverse tasks such as forward simulation, transition path sampling, and trajectory upsampling.
arXiv Detail & Related papers (2024-09-26T13:02:28Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Augmenting Molecular Deep Generative Models with Topological Data
Analysis Representations [21.237758981760784]
We present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules.
Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties.
arXiv Detail & Related papers (2021-06-08T15:49:21Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Embedded-physics machine learning for coarse-graining and collective
variable discovery without data [3.222802562733787]
We present a novel learning framework that consistently embeds underlying physics.
We propose a novel objective based on reverse Kullback-Leibler divergence that fully incorporates the available physics in the form of the atomistic force field.
We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.
arXiv Detail & Related papers (2020-02-24T10:28:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.