Fugu-MT 論文翻訳(概要): A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling

論文の概要: A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling

arxiv url: http://arxiv.org/abs/2510.17187v1
Date: Mon, 20 Oct 2025 06:02:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 00:56:39.325345
Title: A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling
Title（参考訳）: 重み付きアンサンブルサンプリングを用いた機械学習分子動力学の標準化ベンチマーク
Authors: Alexander Aghili, Andy Bruce, Daniel Sabo, Sanya Murdeshwar, Kevin Bachelor, Ionut Mistreanu, Ashwin Lokapally, Razvan Marinescu,
Abstract要約: 本稿では,タンパク質MD法を体系的に評価するモジュール型ベンチマークフレームワークを提案する。このフレームワークには、任意のシミュレーションエンジンをサポートするフレキシブルで軽量なプロパゲータインターフェースが含まれている。 10から224個の残基から、様々な折りたたみ複合体にまたがる9種類のタンパク質のデータセットをコントリビュートする。
参考スコア（独自算出の注目度）: 32.505127447635864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid evolution of molecular dynamics (MD) methods, including machine-learned dynamics, has outpaced the development of standardized tools for method validation. Objective comparison between simulation approaches is often hindered by inconsistent evaluation metrics, insufficient sampling of rare conformational states, and the absence of reproducible benchmarks. To address these challenges, we introduce a modular benchmarking framework that systematically evaluates protein MD methods using enhanced sampling analysis. Our approach uses weighted ensemble (WE) sampling via The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis (WESTPA), based on progress coordinates derived from Time-lagged Independent Component Analysis (TICA), enabling fast and efficient exploration of protein conformational space. The framework includes a flexible, lightweight propagator interface that supports arbitrary simulation engines, allowing both classical force fields and machine learning-based models. Additionally, the framework offers a comprehensive evaluation suite capable of computing more than 19 different metrics and visualizations across a variety of domains. We further contribute a dataset of nine diverse proteins, ranging from 10 to 224 residues, that span a variety of folding complexities and topologies. Each protein has been extensively simulated at 300K for one million MD steps per starting point (4 ns). To demonstrate the utility of our framework, we perform validation tests using classic MD simulations with implicit solvent and compare protein conformational sampling using a fully trained versus under-trained CGSchNet model. By standardizing evaluation protocols and enabling direct, reproducible comparisons across MD approaches, our open-source platform lays the groundwork for consistent, rigorous benchmarking across the molecular simulation community.
Abstract（参考訳）: 機械学習力学を含む分子動力学(MD)手法の急速な進化は、手法検証のための標準化されたツールの開発を上回っている。シミュレーション手法の客観的比較は、不整合評価指標、希少なコンフォメーション状態のサンプリング不足、再現可能なベンチマークの欠如によってしばしば妨げられる。これらの課題に対処するために,拡張サンプリング分析を用いてタンパク質MD法を体系的に評価するモジュール型ベンチマークフレームワークを提案する。提案手法では, タンパク質コンホメーション空間の高速かつ効率的な探索を可能にするために, 時間ラベル付き独立成分分析(TICA)から導出される進行座標に基づいて, 重み付きアンサンブルサンプリング (WE) を並列化解析用加重アンサンブルシミュレータ (WESTPA) を用いて行う。このフレームワークには、任意のシミュレーションエンジンをサポートするフレキシブルで軽量なプロパゲータインターフェースが含まれており、古典的な力場と機械学習ベースのモデルの両方を可能にする。さらに、このフレームワークは19以上の異なるメトリクスとさまざまなドメインにわたる視覚化を計算できる包括的な評価スイートを提供する。さらに、様々な折り畳み複合体とトポロジーにまたがる10から224の残基を含む9つの多様なタンパク質のデータセットをコントリビュートする。それぞれのタンパク質は開始点(4 ns)あたり100万のMDステップで300Kで広範囲にシミュレートされている。本フレームワークの有用性を実証するために,暗黙の溶媒を用いた古典MDシミュレーションを用いて検証試験を行い,完全に訓練されたCGSchNetモデルと未訓練のCGSchNetモデルを用いてタンパク質コンホメーションサンプリングを比較した。評価プロトコルを標準化し、MDアプローチ間で直接再現可能な比較を可能にすることで、我々のオープンソースプラットフォームは、分子シミュレーションコミュニティ全体で一貫した厳密なベンチマークの基盤となる。

論文の概要: A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling

関連論文リスト