Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials
- URL: http://arxiv.org/abs/2503.14293v2
- Date: Wed, 19 Mar 2025 15:03:39 GMT
- Title: Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials
- Authors: Sakib Matin, Emily Shinkle, Yulia Pimonova, Galen T. Craven, Aleksandra Pachalieva, Ying Wai Li, Kipton Barros, Nicholas Lubbers,
- Abstract summary: Machine learning interatomic potentials (MLIPs) are a promising tool to accelerate atomistic simulations and molecular property prediction.<n>The quality of MLIPs depends on the quantity of available training data as well as the quantum chemistry (QC) level of theory used to generate that data.<n>We present an ensemble knowledge distillation (EKD) method to improve MLIP accuracy when trained to energy-only datasets.
- Score: 34.82692226532414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning interatomic potentials (MLIPs) are a promising tool to accelerate atomistic simulations and molecular property prediction. The quality of MLIPs strongly depends on the quantity of available training data as well as the quantum chemistry (QC) level of theory used to generate that data. Datasets generated with high-fidelity QC methods, such as coupled cluster, are typically restricted to small molecules and may be missing energy gradients. With this limited quantity of data, it is often difficult to train good MLIP models. We present an ensemble knowledge distillation (EKD) method to improve MLIP accuracy when trained to energy-only datasets. In our EKD approach, first, multiple teacher models are trained to QC energies and then used to generate atomic forces for all configurations in the dataset. Next, a student MLIP is trained to both QC energies and to ensemble-averaged forces generated by the teacher models. We apply this workflow on the ANI-1ccx dataset which consists of organic molecules with configuration energies computed at the coupled cluster level of theory. The resulting student MLIPs achieve new state-of-the-art accuracy on the out-of-sample COMP6 benchmark and improved stability for molecular dynamics simulations. The EKD approach for MLIP is broadly applicable for chemical, biomolecular and materials science simulations.
Related papers
- PET-MAD, a universal interatomic potential for advanced materials modeling [0.0]
Machine-learning interatomic potentials (MLIPs) have greatly extended the reach of atomic-scale simulations.
We introduce PET-MAD, a generally applicable MLIP trained on a dataset combining stable inorganic and organic solids.
We assess PET-MAD's accuracy on established benchmarks and advanced simulations of six materials.
arXiv Detail & Related papers (2025-03-18T10:35:30Z) - Excited-state nonadiabatic dynamics in explicit solvent using machine learned interatomic potentials [0.602276990341246]
We use FieldSchNet to replace QM/MM electrostatic embedding with its ML/MM counterpart for nonadiabatic excited state trajectories.
Our results demonstrate that the ML/MM model reproduces the electronic kinetics and structural rearrangements of QM/MM surface hopping reference simulations.
arXiv Detail & Related papers (2025-01-28T14:14:43Z) - Predicting ionic conductivity in solids from the machine-learned potential energy landscape [68.25662704255433]
We propose an approach for the quick and reliable screening of ionic conductors through the analysis of a universal interatomic potential.
Eight out of the ten highest-ranked materials are confirmed to be superionic at room temperature in first-principles calculations.
Our method achieves a speed-up factor of approximately 50 compared to molecular dynamics driven by a machine-learning potential, and is at least 3,000 times faster compared to first-principles molecular dynamics.
arXiv Detail & Related papers (2024-11-11T09:01:36Z) - Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy [9.81014501502049]
We develop a unified machine learning method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data.
Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties.
arXiv Detail & Related papers (2024-05-09T19:51:27Z) - Interpolation and differentiation of alchemical degrees of freedom in machine learning interatomic potentials [0.980222898148295]
We report the use of continuous and differentiable alchemical degrees of freedom in atomistic materials simulations.<n>The proposed method introduces alchemical atoms with corresponding weights into the input graph, alongside modifications to the message-passing and readout mechanisms of MLIPs.<n>The end-to-end differentiability of MLIPs enables efficient calculation of the gradient of energy with respect to the compositional weights.
arXiv Detail & Related papers (2024-04-16T17:24:22Z) - QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories.
We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z) - Molecular Geometry-aware Transformer for accurate 3D Atomic System
modeling [51.83761266429285]
We propose a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them.
Moleformer achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties.
arXiv Detail & Related papers (2023-02-02T03:49:57Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z) - Automated discovery of a robust interatomic potential for aluminum [4.6028828826414925]
Machine learning (ML) based potentials aim for faithful emulation of quantum mechanics (QM) calculations at drastically reduced computational cost.
We present a highly automated approach to dataset construction using the principles of active learning (AL)
We demonstrate this approach by building an ML potential for aluminum (ANI-Al)
To demonstrate transferability, we perform a 1.3M atom shock simulation, and show that ANI-Al predictions agree very well with DFT calculations on local atomic environments sampled from the nonequilibrium dynamics.
arXiv Detail & Related papers (2020-03-10T19:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.