Related papers: Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields

Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields

URL: http://arxiv.org/abs/2509.00090v2
Date: Fri, 17 Oct 2025 16:12:36 GMT
Title: Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields
Authors: Yi Cao, Paulette Clancy,
Abstract summary: Machine-learned force fields (MLFFs) are transforming computational materials science by enabling ab initio-level accuracy at molecular dynamics scales.<n>Yet their rapid rise raises a key question: should researchers train specialist models from scratch, fine-tune generalist foundation models, or use hybrid approaches?<n>We introduce a benchmarking framework using defect migration pathways, evaluated through elastic band trajectories, as diagnostic probes.<n>Fine-tuned models substantially outperform from-scratch and zero-shot approaches for kinetic properties but show partial loss of long-range physics.
Score: 1.572216094651749
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine-learned force fields (MLFFs), especially pre-trained foundation models, are transforming computational materials science by enabling ab initio-level accuracy at molecular dynamics scales. Yet their rapid rise raises a key question: should researchers train specialist models from scratch, fine-tune generalist foundation models, or use hybrid approaches? The trade-offs in data efficiency, accuracy, cost, and robustness to out-of-distribution failure remain unclear. We introduce a benchmarking framework using defect migration pathways, evaluated through nudged elastic band trajectories, as diagnostic probes that test both interpolation and extrapolation. Using Cr-doped Sb2Te3 as a representative two-dimensional material, we benchmark multiple training paradigms within the MACE architecture across equilibrium, kinetic (atomic migration), and mechanical (interlayer sliding) tasks. Fine-tuned models substantially outperform from-scratch and zero-shot approaches for kinetic properties but show partial loss of long-range physics. Representational analysis reveals distinct, non-overlapping latent encodings, indicating that different training strategies learn different aspects of system physics. This framework provides practical guidelines for MLFF development and establishes migration-based probes as efficient diagnostics linking performance to learned representations, guiding future uncertainty-aware active learning.

Related papers

Equivariant Evidential Deep Learning for Interatomic Potentials [55.6997213490859]
Uncertainty quantification is critical for assessing the reliability of machine learning interatomic potentials in molecular dynamics simulations.<n>Existing UQ approaches for MLIPs are often limited by high computational cost or suboptimal performance.<n>We propose textitEquivariant Evidential Deep Learning for Interatomic Potentials ($texte2$IP), a backbone-agnostic framework that models atomic forces and their uncertainty jointly.
arXiv Detail & Related papers (2026-02-11T02:00:25Z)
Benchmarking neural surrogates on realistic spatiotemporal multiphysics flows [18.240532888032394]
We present REALM (REalistic AI Learning for Multiphysics), a rigorous benchmarking framework designed to test neural surrogates on challenging, application-driven reactive flows.<n>We benchmark over a dozen representative surrogate model families, including spectral operators, convolutional models, Transformers, pointwise operators, and graph/mesh networks.<n>We identify three robust trends: (i) a scaling barrier governed jointly by dimensionality, stiffness, and mesh irregularity, leading to rapidly growing rollout errors; (ii) performance primarily controlled by architectural inductive biases rather than parameter count; and (iii) a persistent gap between nominal accuracy metrics and physically
arXiv Detail & Related papers (2025-12-21T05:04:13Z)
From Physics to Machine Learning and Back: Part II - Learning and Observational Bias in PHM [52.64097278841485]
Review examines how incorporating learning and observational biases through physics-informed modeling and data strategies can guide models toward physically consistent and reliable predictions.<n>Fast adaptation methods including meta-learning and few-shot learning are reviewed alongside domain generalization techniques.
arXiv Detail & Related papers (2025-09-25T14:15:43Z)
Analysis of Transferability Estimation Metrics for Surgical Phase Recognition [3.3285108719932555]
Fine-tuning pre-trained models has become a cornerstone of modern machine learning, allowing practitioners to achieve high performance with limited labeled data.<n>In surgical video analysis, where expert annotations are especially time-consuming and costly, identifying the most suitable pre-trained model for a downstream task is both critical and challenging.<n>We provide the first comprehensive benchmark of three representative metrics, LogME, H-Score, and TransRate, on two diverse datasets.
arXiv Detail & Related papers (2025-08-22T18:05:33Z)
Foundation Model for Skeleton-Based Human Action Understanding [56.89025287217221]
This paper presents a Unified Skeleton-based Dense Representation Learning framework.<n>USDRL consists of a Transformer-based Dense Spatio-Temporal (DSTE), Multi-Grained Feature Decorrelation (MG-FD), and Multi-Perspective Consistency Training (MPCT)
arXiv Detail & Related papers (2025-08-18T02:42:16Z)
Learning Satellite Attitude Dynamics with Physics-Informed Normalising Flow [0.26217304977339473]
We investigate the benefits of incorporating Physics-Informed Neural Networks into the learning of spacecraft attitude dynamics.<n>We train several models on simulated data generated with the Basilisk simulator.<n>We find that PINN-based models consistently outperform their purely data-driven counterparts in terms of control accuracy and robustness.
arXiv Detail & Related papers (2025-08-11T10:50:49Z)
Physics-Informed Multimodal Bearing Fault Classification under Variable Operating Conditions using Transfer Learning [0.46085106405479537]
This study proposes a physics-informed multimodal convolutional neural network (CNN) with a late fusion architecture.<n>The model incorporates a novel physics-informed loss function that penalizes physically implausible predictions.<n>Experiments on the Paderborn University dataset demonstrate that the proposed physics-informed approach consistently outperforms a non-physics-informed baseline.
arXiv Detail & Related papers (2025-08-11T01:32:09Z)
Probing In-Context Learning: Impact of Task Complexity and Model Architecture on Generalization and Efficiency [10.942999793311765]
We investigate in-context learning (ICL) through a meticulous experimental framework that systematically varies task complexity and model architecture.<n>We evaluate four distinct models: a GPT2-style Transformer, a Transformer with FlashAttention mechanism, a convolutional Hyena-based model, and the Mamba state-space model.
arXiv Detail & Related papers (2025-05-10T00:22:40Z)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [51.41246396610475]
This paper aims to predict performance in closed-book question answering (QA) without the help of external tools.<n>We conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models.<n>Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics.
arXiv Detail & Related papers (2025-02-06T13:23:53Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Physics-Informed Weakly Supervised Learning for Interatomic Potentials [17.165117198519248]
We introduce a physics-informed, weakly supervised approach for training machine-learned interatomic potentials (MLIPs)<n>We demonstrate reduced energy and force errors -- often lower by a factor of two -- for various baseline models and benchmark data sets.<n>Our approach improves the fine-tuning of foundation models on sparse, highly accurate ab initio data.
arXiv Detail & Related papers (2024-07-23T12:49:04Z)
Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials [25.091146216183144]
Active learning uses biased or unbiased molecular dynamics to generate candidate pools. Existing biased and unbiased MD-simulation methods are prone to miss either rare events or extrapolative regions. This work demonstrates that MD, when biased by the MLIP's energy uncertainty, simultaneously captures extrapolative regions and rare events.
arXiv Detail & Related papers (2023-12-03T14:39:14Z)
Differentiable modeling to unify machine learning and physical models and advance Geosciences [38.92849886903847]
We outline the concepts, applicability, and significance of differentiable geoscientific modeling (DG) "Differentiable" refers to accurately and efficiently calculating gradients with respect to model variables. Preliminary evidence suggests DG offers better interpretability and causality than Machine Learning.
arXiv Detail & Related papers (2023-01-10T15:24:14Z)
Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations [5.138982355658199]
Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics.
arXiv Detail & Related papers (2022-10-13T17:59:03Z)
Learning Physical Dynamics with Subequivariant Graph Neural Networks [99.41677381754678]
Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics. Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization. Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall.
arXiv Detail & Related papers (2022-10-13T10:00:30Z)
Physics-informed machine learning with differentiable programming for heterogeneous underground reservoir pressure management [64.17887333976593]
Avoiding over-pressurization in subsurface reservoirs is critical for applications like CO2 sequestration and wastewater injection. Managing the pressures by controlling injection/extraction are challenging because of complex heterogeneity in the subsurface. We use differentiable programming with a full-physics model and machine learning to determine the fluid extraction rates that prevent over-pressurization.
arXiv Detail & Related papers (2022-06-21T20:38:13Z)
Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models. We propose a VAE architecture in which a part of the latent space is grounded by physics. We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.