Energy-Based Coarse-Graining in Molecular Dynamics: A Flow-Based Framework without Data
- URL: http://arxiv.org/abs/2504.20940v2
- Date: Sun, 26 Oct 2025 13:00:52 GMT
- Title: Energy-Based Coarse-Graining in Molecular Dynamics: A Flow-Based Framework without Data
- Authors: Maximilian Stupp, P. S. Koutsourelakis,
- Abstract summary: Coarse-grained (CG) models provide an effective route to reducing the complexity of molecular simulations.<n>We introduce a fully data-free, generative framework for CG that directly targets the all-atom Boltzmann distribution.<n>We show that the method captures all relevant modes of the Boltzmann distribution, reconstructs atomic configurations, and automatically learns physically meaningful CG representations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Coarse-grained (CG) models provide an effective route to reducing the complexity of molecular simulations (MD), but conventional approaches depend heavily on long all-atom MD trajectories to adequately sample configurational space. This data dependence limits accuracy and generalizability, as unvisited configurations remain excluded from the resulting CG models. We introduce a fully data-free, generative framework for CG that directly targets the all-atom Boltzmann distribution. The model defines a structured latent space comprising slow collective variables, associated with multimodal marginal densities capturing metastable states, and fast variables, represented through simple, unimodal conditional distributions. A learnable, bijective map from latent space to atomistic coordinates enables the automatic and accurate reconstruction of molecular structures. Training relies solely on the interatomic potential and minimizes the reverse Kullback-Leibler (KL) divergence via an energy-based objective. To stabilize optimization and ensure mode coverage, we employ an adaptive tempering scheme that promotes the exploration of diverse configurations. Once trained, the model can generate independent, one-shot equilibrium samples at full atomic resolution. Validation on two synthetic systems, a double-well potential and a Gaussian mixture model, as well as on the benchmark alanine dipeptide, demonstrates that the method captures all relevant modes of the Boltzmann distribution, reconstructs atomic configurations, and automatically learns physically meaningful CG representations. These results suggest a promising, data-free alternative to traditional CG techniques, offering both a principled approach to addressing the long-standing "chicken-and-egg" challenge in coarse-graining and an effective solution to the back-mapping problem by enabling accurate reconstruction of all-atom configurations.
Related papers
- Coarse-Grained Boltzmann Generators [2.8880597165704]
We propose a principled framework that unifies scalable reduced-order modeling with the exactness of importance sampling.<n>CG-BGs act in a coarse-grained coordinate space, using a learned potential of mean force to reweight samples generated by a flow-based model.<n>Our results demonstrate that CG-BGs faithfully capture complex interactions mediated by explicit solvent within highly reduced representations.
arXiv Detail & Related papers (2026-02-11T08:37:13Z) - SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z) - Physically Interpretable Representation Learning with Gaussian Mixture Variational AutoEncoder (GM-VAE) [37.18249990338269]
We propose a Variational Autoencoder (GM-VAE) framework designed to extract, physically interpretable representations from high-dimensional scientific data.<n>Unlike conventional VAEs that jointly optimize reconstruction and clustering, our method utilizes a block-coordinate descent strategy.<n>To objectively evaluate the learned representations, we introduce a metric based on graph-Laplacian smoothness, which measures the coherence of physical instability across the latent manifold.
arXiv Detail & Related papers (2025-11-26T20:04:38Z) - Cryo-EM as a Stochastic Inverse Problem [3.7068356204071637]
Cryo-electron microscopy (Cryo-EM) enables high-resolution imaging of biomolecules.<n>Traditional methods assume a discrete set of conformations, limiting their ability to recover continuous structural variability.<n>We formulate cryo-EM reconstruction as an inverse problem (SIP) over probability measures.<n>We numerically solve using particles to represent and evolve conformational ensembles.
arXiv Detail & Related papers (2025-09-05T23:35:04Z) - CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction [48.45613121595709]
cryogenic electron microscopy (cryo-EM) facilitates the determination of macromolecular structures at near-atomic resolution.<n>The core computational task in single-particle cryo-EM is to reconstruct the 3D electrostatic potential of a molecule from noisy 2D projections acquired at unknown orientations.<n>We propose cryoSplat, a GMM-based method that integrates Gaussian splatting with the physics of cryo-EM image formation.
arXiv Detail & Related papers (2025-08-06T23:24:43Z) - Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients [0.0]
Score-based Generative Models (SGMs) approximate a data distribution by perturbing it with Gaussian noise and subsequently denoising it via a learned diffusion process.<n>We establish the first non-asymotic Wasserstein-2 convergence guarantees for SGMs targeting semi-one order with potentially discontinuous gradients.
arXiv Detail & Related papers (2025-05-06T11:17:15Z) - Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks [23.196852966408482]
We propose a graph-based latent diffusion model that enables direct sampling of states from their equilibrium distribution.<n>This allows for the efficient geometries of flow statistics without running long and expensive numerical simulations.<n>We apply this method to a range of fluid dynamics tasks, such as predicting pressure on 3D wing models in turbulent flow.
arXiv Detail & Related papers (2025-03-19T13:04:39Z) - Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion [55.95767828747407]
In domains such as molecular and protein generation, physical systems exhibit inherent symmetries that are critical to model.<n>We present a framework that reduces training variance and provides a provably lower-variance gradient estimator.<n>We also present a practical implementation of this estimator incorporating the loss and sampling procedure through a method we call Orbit Diffusion.
arXiv Detail & Related papers (2025-02-14T03:26:57Z) - On the generalization ability of coarse-grained molecular dynamics models for non-equilibrium processes [6.177038245239759]
We present a data-driven approach for constructing CGMD models that retain certain generalization ability for non-equilibrium processes.
Unlike the conventional CG models based on pre-selected CG variables, the present CG model seeks a set of auxiliary CG variables.
This ensures the distribution of the unresolved variables under a broad range of non-equilibrium conditions approaches the one under equilibrium.
arXiv Detail & Related papers (2024-09-17T19:42:50Z) - Adaptive Fuzzy C-Means with Graph Embedding [84.47075244116782]
Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods.
We propose a novel FCM based clustering model that is capable of automatically learning an appropriate membership degree hyper- parameter value.
arXiv Detail & Related papers (2024-05-22T08:15:50Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Implicit Transfer Operator Learning: Multiple Time-Resolution Surrogates
for Molecular Dynamics [8.35780131268962]
We present Implict Transfer Operator (ITO) Learning, a framework to learn surrogates of the simulation process with multiple time-resolutions.
We also present a coarse-grained CG-SE3-ITO model which can quantitatively model all-atom molecular dynamics.
arXiv Detail & Related papers (2023-05-29T12:19:41Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Latent Space Diffusion Models of Cryo-EM Structures [6.968705314671148]
We train a diffusion model as an expressive, learnable prior in the cryoDRGN framework.
By learning an accurate model of the data distribution, our method unlocks tools in generative modeling, sampling, and distribution analysis.
arXiv Detail & Related papers (2022-11-25T15:17:10Z) - GANs and Closures: Micro-Macro Consistency in Multiscale Modeling [0.0]
We present an approach that couples physics-based simulations and biasing methods for sampling conditional distributions with Machine Learning-based conditional generative adversarial networks.
We show that this framework can improve multiscale SDE dynamical systems sampling, and even shows promise for systems of increasing complexity.
arXiv Detail & Related papers (2022-08-23T03:45:39Z) - GeoDiff: a Geometric Diffusion Model for Molecular Conformation
Generation [102.85440102147267]
We propose a novel generative model named GeoDiff for molecular conformation prediction.
We show that GeoDiff is superior or comparable to existing state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-06T09:47:01Z) - Normalizing flows for atomic solids [67.70049117614325]
We present a machine-learning approach, based on normalizing flows, for modelling atomic solids.
We report Helmholtz free energy estimates for cubic and hexagonal ice modelled as monatomic water as well as for a truncated and shifted Lennard-Jones system.
Our results thus demonstrate that normalizing flows can provide high-quality samples and free energy estimates of solids, without the need for multi-staging or for imposing restrictions on the crystal geometry.
arXiv Detail & Related papers (2021-11-16T18:54:49Z) - Moser Flow: Divergence-based Generative Modeling on Manifolds [49.04974733536027]
Moser Flow (MF) is a new class of generative models within the family of continuous normalizing flows (CNF)
MF does not require invoking or backpropagating through an ODE solver during training.
We demonstrate for the first time the use of flow models for sampling from general curved surfaces.
arXiv Detail & Related papers (2021-08-18T09:00:24Z) - Embedded-physics machine learning for coarse-graining and collective
variable discovery without data [3.222802562733787]
We present a novel learning framework that consistently embeds underlying physics.
We propose a novel objective based on reverse Kullback-Leibler divergence that fully incorporates the available physics in the form of the atomistic force field.
We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.
arXiv Detail & Related papers (2020-02-24T10:28:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.