A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling
- URL: http://arxiv.org/abs/2510.17187v1
- Date: Mon, 20 Oct 2025 06:02:36 GMT
- Title: A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling
- Authors: Alexander Aghili, Andy Bruce, Daniel Sabo, Sanya Murdeshwar, Kevin Bachelor, Ionut Mistreanu, Ashwin Lokapally, Razvan Marinescu,
- Abstract summary: We introduce a modular benchmarking framework that systematically evaluates protein MD methods.<n>The framework includes a flexible, lightweight propagator interface that supports arbitrary simulation engines.<n>We contribute a dataset of nine diverse proteins, ranging from 10 to 224 residues, that span a variety of folding complexities.
- Score: 32.505127447635864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid evolution of molecular dynamics (MD) methods, including machine-learned dynamics, has outpaced the development of standardized tools for method validation. Objective comparison between simulation approaches is often hindered by inconsistent evaluation metrics, insufficient sampling of rare conformational states, and the absence of reproducible benchmarks. To address these challenges, we introduce a modular benchmarking framework that systematically evaluates protein MD methods using enhanced sampling analysis. Our approach uses weighted ensemble (WE) sampling via The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis (WESTPA), based on progress coordinates derived from Time-lagged Independent Component Analysis (TICA), enabling fast and efficient exploration of protein conformational space. The framework includes a flexible, lightweight propagator interface that supports arbitrary simulation engines, allowing both classical force fields and machine learning-based models. Additionally, the framework offers a comprehensive evaluation suite capable of computing more than 19 different metrics and visualizations across a variety of domains. We further contribute a dataset of nine diverse proteins, ranging from 10 to 224 residues, that span a variety of folding complexities and topologies. Each protein has been extensively simulated at 300K for one million MD steps per starting point (4 ns). To demonstrate the utility of our framework, we perform validation tests using classic MD simulations with implicit solvent and compare protein conformational sampling using a fully trained versus under-trained CGSchNet model. By standardizing evaluation protocols and enabling direct, reproducible comparisons across MD approaches, our open-source platform lays the groundwork for consistent, rigorous benchmarking across the molecular simulation community.
Related papers
- DynaMate: An Autonomous Agent for Protein-Ligand Molecular Dynamics Simulations [13.253932177045842]
Force field-based molecular dynamics (MD) simulations are indispensable for probing the structure, dynamics, and functions of biomolecular systems.<n>The technical complexity of MD setup, encompassing parameterization, input preparation, and software configuration, remains a major barrier for widespread and efficient usage.<n>Here, we present DynaMate, a modular multi-agent framework that autonomously designs and executes complete MD for both protein and protein-ligand systems.
arXiv Detail & Related papers (2025-12-10T19:40:51Z) - Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture [0.0]
We introduce a dataset based on PFM simulations designed to benchmark and advance ML methods for fracture modeling.<n>This dataset includes three energy decomposition methods, two boundary conditions, and 1,000 random initial crack configurations for a total of 6,000 simulations.<n>Our results highlight both the promise and limitations of popular current models, and demonstrate the utility of this dataset as a testbed for advancing machine learning in fracture mechanics research.
arXiv Detail & Related papers (2025-07-09T19:14:56Z) - Revisit Mixture Models for Multi-Agent Simulation: Experimental Study within a Unified Framework [19.558523263211942]
In multi-agent simulation, the primary challenges include behavioral multimodality and closed-loop distributional shifts.<n>In this study, we revisit mixture models for generating multimodal agent behaviors, which can cover the mainstream methods.<n>We introduce a closed-loop sample generation approach tailored for mixture models to mitigate distributional shifts.
arXiv Detail & Related papers (2025-01-28T15:26:25Z) - Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration [81.45763823762682]
This work aims to bridge the gap by investigating the problem of data synthesis through multi-agent sampling.<n>We introduce Tree Search-based Orchestrated Agents(TOA), where the workflow evolves iteratively during the sequential sampling process.<n>Our experiments on alignment, machine translation, and mathematical reasoning demonstrate that multi-agent sampling significantly outperforms single-agent sampling as inference compute scales.
arXiv Detail & Related papers (2024-12-22T15:16:44Z) - A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics [73.35846234413611]
In drug discovery, molecular dynamics (MD) simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites.
We propose NeuralMD, the first machine learning (ML) surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding dynamics.
We demonstrate the efficiency and effectiveness of NeuralMD, achieving over 1K$times$ speedup compared to standard numerical MD simulations.
arXiv Detail & Related papers (2024-01-26T09:35:17Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Str2Str: A Score-based Framework for Zero-shot Protein Conformation
Sampling [23.74897713386661]
The dynamic nature of proteins is crucial for determining their biological functions and properties.
Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training.
We propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling.
arXiv Detail & Related papers (2023-06-05T15:19:06Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Extending Process Discovery with Model Complexity Optimization and
Cyclic States Identification: Application to Healthcare Processes [62.997667081978825]
The paper presents an approach to process mining providing semi-automatic support to model optimization.
A model simplification approach is proposed, which essentially abstracts the raw model at the desired granularity.
We aim to demonstrate the capabilities of the technological solution using three datasets from different applications in the healthcare domain.
arXiv Detail & Related papers (2022-06-10T16:20:59Z) - An Extensible Benchmark Suite for Learning to Simulate Physical Systems [60.249111272844374]
We introduce a set of benchmark problems to take a step towards unified benchmarks and evaluation protocols.
We propose four representative physical systems, as well as a collection of both widely used classical time-based and representative data-driven methods.
arXiv Detail & Related papers (2021-08-09T17:39:09Z) - Using Machine Learning to Emulate Agent-Based Simulations [0.0]
We evaluate the performance of multiple machine-learning methods as statistical emulators for use in the analysis of agent-based models (ABMs)
We propose that agent-based modelling would benefit from using machine-learning methods for emulation, as this can facilitate more robust sensitivity analyses for the models.
arXiv Detail & Related papers (2020-05-05T11:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.